amiller
(Alan)
October 15, 2021, 1:39am
1
What is the problem you are having with rclone?
Trying to copy HDFS data between 2 clusters
The source cluster has TDE, the destination HDFS cluster has no TDE.
rclone copies the files to the destination cluster but they are useless.
What is your rclone version (output from rclone version
)
rclone v1.56.2
- os/version: redhat 7.8 (64 bit)
- os/kernel: 3.10.0-1127.el7.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.16.8
- go/linking: static
- go/tags: none
Which cloud storage system are you using? (eg Google Drive)
on-premise HDFS
The command you were trying to run (eg rclone copy /tmp remote:tmp
)
rclone copy san_prod:/tests/data las_prod:/tests/data
The rclone config contents with secrets removed.
[las_prod]
type = hdfs
namenode = lassdppr01hdp01.las.ssnsgs.net:8020
username = hdfs
[san_prod]
type = hdfs
namenode = sansdppr01hdp01.san.ssnsgs.net:8020
username = hdfs
A log from the command with the -vv
flag
: [0132] root@lassdppr01nag01:rclone # ; ./rclone -vv copy san_prod:/tests/data las_prod:/tests/data
2021/10/15 01:36:14 DEBUG : rclone: Version "v1.56.2" starting with parameters ["./rclone" "-vv" "copy" "san_prod:/tests/data" "las_prod:/tests/data"]
2021/10/15 01:36:14 DEBUG : Creating backend with remote "san_prod:/tests/data"
2021/10/15 01:36:14 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2021/10/15 01:36:15 DEBUG : Creating backend with remote "las_prod:/tests/data"
2021/10/15 01:36:15 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: list [/tests/data]
2021/10/15 01:36:15 DEBUG : hdfs://sansdppr01hdp01.san.ssnsgs.net:8020: list [/tests/data]
2021/10/15 01:36:15 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: Waiting for checks to finish
2021/10/15 01:36:15 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: Waiting for transfers to finish
2021/10/15 01:36:15 DEBUG : hdfs://sansdppr01hdp01.san.ssnsgs.net:8020: open [/tests/data/20211002-62664aaf-b1b0-42f0-bf0a-2276b22236d9.csv.gz]
2021/10/15 01:36:15 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: update [/tests/data/20211002-62664aaf-b1b0-42f0-bf0a-2276b22236d9.csv.gz]
2021/10/15 01:36:15 INFO : 20211002-62664aaf-b1b0-42f0-bf0a-2276b22236d9.csv.gz: Copied (new)
2021/10/15 01:36:15 INFO :
Transferred: 5.214Ki / 5.214 KiByte, 100%, 0 Byte/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 0.1s
2021/10/15 01:36:15 DEBUG : 5 go routines active
asdffdsa
(jojothehumanmonkey)
October 15, 2021, 12:02pm
2
amiller:
they are useless
in what way are they useless?
amiller
(Alan)
October 15, 2021, 10:40pm
3
The files on the destination HDFS cluster are corrupt.
I can't unzip a gzip file and I cant open a text file.
In the source HDFS cluster these files are in an "encryption zone". See Hadoop TDE
Even when I try to copy files by prepending the paths with /.reserved/raw/the files are not readable in the destination cluster.
amiller
(Alan)
October 15, 2021, 10:51pm
4
Here is a full example of what I mean:
In the source HDFS cluster I copy a file into my /prod1 encryption zone:
: [2243] hdfs@sansdppr01hdp05:rclone $ ; hdfs dfs -ls /prod1/raw/test
Found 1 items
-rw-r--r-- 3 hdfs hadoop 14899752 2021-10-15 22:43 /prod1/raw/test/rclone.tgz
: [SAN_PROD01 Hadoop] ;
: [2244] hdfs@sansdppr01hdp05:rclone $ ; hdfs dfs -copyToLocal /prod1/raw/test/rclone.tgz .
: [SAN_PROD01 Hadoop] ;
: [2244] hdfs@sansdppr01hdp05:rclone $ ; tar tvzf rclone.tgz
-rw-r--r-- hdfs/hadoop 1131 2021-10-13 23:17 git-log.txt
-rwxr-xr-x hdfs/hadoop 43847680 2021-10-13 23:17 rclone
-rw-r--r-- hdfs/hadoop 1455517 2021-10-13 23:17 rclone.1
-rw-r--r-- hdfs/hadoop 1575634 2021-10-13 23:17 README.html
-rw-r--r-- hdfs/hadoop 1278833 2021-10-13 23:17 README.txt
Then I run rclone to copy this dir to my destination HDFS cluster:
: [2246] hdfs@sansdppr01hdp05:rclone $ ; ./rclone -vv copy san_prod:/.reserved/raw/prod1/raw/test las_prod:/.reserved/raw/prod1/raw/test
2021/10/15 22:48:04 DEBUG : rclone: Version "v1.56.2" starting with parameters ["./rclone" "-vv" "copy" "san_prod:/.reserved/raw/prod1/raw/test" "las_prod:/.reserved/raw/prod1/raw/test"]
2021/10/15 22:48:04 DEBUG : Creating backend with remote "san_prod:/.reserved/raw/prod1/raw/test"
2021/10/15 22:48:04 DEBUG : Using config file from "/home/hdfs/.config/rclone/rclone.conf"
2021/10/15 22:48:04 DEBUG : Creating backend with remote "las_prod:/.reserved/raw/prod1/raw/test"
2021/10/15 22:48:04 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: list [/.reserved/raw/prod1/raw/test]
2021/10/15 22:48:04 DEBUG : hdfs://sansdppr01hdp01.san.ssnsgs.net:8020: list [/.reserved/raw/prod1/raw/test]
2021/10/15 22:48:04 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: Waiting for checks to finish
2021/10/15 22:48:04 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: Waiting for transfers to finish
2021/10/15 22:48:04 DEBUG : hdfs://sansdppr01hdp01.san.ssnsgs.net:8020: open [/.reserved/raw/prod1/raw/test/rclone.tgz]
2021/10/15 22:48:04 DEBUG : hdfs://lassdppr01hdp01.las.ssnsgs.net:8020: update [/.reserved/raw/prod1/raw/test/rclone.tgz]
2021/10/15 22:48:05 INFO : rclone.tgz: Copied (new)
2021/10/15 22:48:05 INFO :
Transferred: 14.210Mi / 14.210 MiByte, 100%, 0 Byte/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 0.5s
2021/10/15 22:48:05 DEBUG : 5 go routines active
The tgz file in the destination cluster:
: [2241] hdfs@lassdppr01hdp05:~ $ ; hdfs dfs -ls /prod1/raw/test
Found 1 items
-rw-r--r-- 3 hdfs hadoop 14899752 2021-10-15 22:43 /prod1/raw/test/rclone.tgz
: [SLDP: LAS_PROD01] ;
: [2249] hdfs@lassdppr01hdp05:~ $ ; hdfs dfs -copyToLocal /prod1/raw/test/rclone.tgz .
: [SLDP: LAS_PROD01] ;
: [2249] hdfs@lassdppr01hdp05:~ $ ; tar tvzf rclone.tgz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
I don't know much about this backend either.
Perhaps @ncw or @ivandeex might have a thought as I was looking through the feature request and I think they both were involved. I do not see @urykhy on the forums.
ncw
(Nick Craig-Wood)
October 16, 2021, 9:47am
7
I suspect you are right about it not supporting TDE.
Can you open a new issue on Github and we can ask the experts there about it.
Thanks
ivandeex
(Ivan Andreev)
October 16, 2021, 10:35am
8
I don't know what is TDE so can't comment. Let gurus decide.
amiller
(Alan)
October 17, 2021, 1:28am
9
I also wanted to mention that copying HDFS files from an "unencrypted" area in the source cluster to the destination cluster works fine (there is no data corruption).
For example:
All files under /prod1 are an "encryption zone"
First I copied the contents of /prod1/raw/test to /unencrypted/raw/test using
the hadoop distcp command (preserves timestamps, perms, acs, etc)
Then I ran rclone copy san_prod:/unencrypted/raw/test las_prod:/prod1/raw/test
Which achieves my goal of mirroring data across the 2 HDFS clusters.
So, this is probably a feature request for rclone to support Hadoop TDE.
It would be nice if the documentation would at least mention this limitation.
system
(system)
Closed
November 16, 2021, 1:28am
10
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.