I'm in the progress of migrating data (90TB total) from and to a specific S3 bucket on an on-premise S3-compatible storage (Cloudian).
Reason is that nodes will be added and I will have to change the storage policy to a more space friendly one (RF3 to Erasure Coding) and it's not possible to change the policy on the fly, only when creating a new bucket.
Since my backup software, which writes to S3, doesn't allow the migration between buckets I need to do it on the S3 side.
Basically the flow will be as follows:
Original S3 Bucket -> Temporary S3 Bucket -> New S3 Bucket (with original name)
I've used different tools with different forms of success, for 1 or another reason although aws cli syncs everything correctly there seems something to be wrong as my backup software can't seem to write anything afterwards and I haven't found out yet why it does that as everything from permissions to metadata looks to be correct.
Using rclone the migration seems to work, but I found 2 quirks if you'd like to call it that.
When doing a sync between S3 buckets on the same S3 storage metadata syncs correctly, however when doing the incremental the setting of the modtime metadata makes the rest of the metadata disappear. using --no-update-modtime solves this and metadata stays ok. Is this expected behaviour when not using this flag?
When doing a sync between 2 S3 buckets on different S3 compatible storage (both Cloudian) metadata doesn't get synced no matter what I try.
Last question, when doing a sync I get the following notice:
NOTICE: S3 bucket migrate-rbrk-rubrik-0: --checksum is in use but the source and destination have no hashes in common; falling back to --size-only
Is this notice for files which were uploaded as a multi-part because of size or is it for all files? I haven't found a clear answer for this yet.
Hmm... I just had a look at the code. setting the modtime shoud preserve the metadata. If you could try rclone touch s3:bucket/file -vv --dump bodies and post the result that would help debug.
That is expected. If you are doing a sync between different cloud storage systems, rclone can't do server side copies.
It would be possible to fix it relatively easily though (I'm sure there is an issue somewhere about that!).
Two S3 remotes should have MD5SUM in common regardless of whether the files were uploaded as big files or not... Are they all plain S3 remotes (no crypt)? Can you post the log with -vv up to that message?
I've added a test header X-Amz-Meta-Rclone-Test-Header to a file, when I run rclone touch it reads the headers correct and seems to include them in the PUT request but as you can see on the second command the header disappeared.
That explains, I was looking at different migration scenarios as the space is limited, having a temporary S3 target would have been nice but metadata needs to be synced as well to do this. I looked through the code a bit but since I'm not a coder it takes a little more time.
They are encrypted but they are put as an encrypted file with the metadata containing values like unencrypted content length, iv and key. So they are client-side encrypted, not server-side.
I've tested with a regular ISO and performed 2 syncs to a different bucket. It seems that I get the notice only the second time when the file exists in both buckets.
As far as i understood the ETag header should be the md5sum of the file?
ETag of original file: "00704962b5cd9c313fb03f312dfe104d-115"
ETag of copy: "bd43d41e01c2a46b3cb23eb9139dce4b"
Initial transfer of new file (CentOS ISO):
mmassez@ubuntu:~$ rclone sync cloudianlab:test-rubrik-0 cloudianlab:mig-test-rubrik-0 -P --delete-during --no-update-modtime --transfers 32 -c -vv |tee synclog.txt
2019/05/28 10:15:37 DEBUG : rclone: Version "v1.47.0" starting with parameters ["rclone" "sync" "cloudianlab:test-rubrik-0" "cloudianlab:mig-test-rubrik-0" "-P" "--delete-during" "--no-update-modtime" "--transfers" "32" "-c" "-vv"]
2019/05/28 10:15:37 DEBUG : Using config file from "/home/mmassez/.config/rclone/rclone.conf"
2019-05-28 10:15:37 INFO : Waiting for deletions to finish
2019-05-28 10:15:37 DEBUG : rubrik_cluster_lock.txt: Size and MD5 of src and dst objects identical
2019-05-28 10:15:37 DEBUG : rubrik_cluster_lock.txt: Unchanged skipping
2019-05-28 10:15:37 DEBUG : rubrik_encryption_key_check.txt: Size and MD5 of src and dst objects identical
2019-05-28 10:15:37 DEBUG : rubrik_encryption_key_check.txt: Unchanged skipping
2019-05-28 10:15:37 INFO : S3 bucket mig-test-rubrik-0: Waiting for checks to finish
2019-05-28 10:15:37 INFO : S3 bucket mig-test-rubrik-0: Waiting for transfers to finish
2019-05-28 10:15:56 INFO : CentOS-7-x86_64-Minimal-1810.iso: Copied (server side copy)
2019/05/28 10:15:56 DEBUG : 6 go routines active
2019/05/28 10:15:56 DEBUG : rclone: Version "v1.47.0" finishing with parameters ["rclone" "sync" "cloudianlab:test-rubrik-0" "cloudianlab:mig-test-rubrik-0" "-P" "--delete-during" "--no-update-modtime" "--transfers" "32" "-c" "-vv"]
Resync of the 2 buckets:
mmassez@ubuntu:~$ rclone sync cloudianlab:test-rubrik-0 cloudianlab:mig-test-rubrik-0 -P --delete-during --no-update-modtime --transfers 32 -c -vv |tee synclog.txt
2019/05/28 10:16:04 DEBUG : rclone: Version "v1.47.0" starting with parameters ["rclone" "sync" "cloudianlab:test-rubrik-0" "cloudianlab:mig-test-rubrik-0" "-P" "--delete-during" "--no-update-modtime" "--transfers" "32" "-c" "-vv"]
2019/05/28 10:16:04 DEBUG : Using config file from "/home/mmassez/.config/rclone/rclone.conf"
2019-05-28 10:16:04 INFO : Waiting for deletions to finish
2019-05-28 10:16:05 DEBUG : rubrik_cluster_lock.txt: Size and MD5 of src and dst objects identical
2019-05-28 10:16:05 DEBUG : rubrik_cluster_lock.txt: Unchanged skipping
2019-05-28 10:16:05 DEBUG : rubrik_encryption_key_check.txt: Size and MD5 of src and dst objects identical
2019-05-28 10:16:05 DEBUG : rubrik_encryption_key_check.txt: Unchanged skipping
2019-05-28 10:16:05 INFO : S3 bucket mig-test-rubrik-0: Waiting for checks to finish
2019-05-28 10:16:05 NOTICE: S3 bucket mig-test-rubrik-0: --checksum is in use but the source and destination have no hashes in common; falling back to --size-only
2019-05-28 10:16:05 DEBUG : CentOS-7-x86_64-Minimal-1810.iso: Size of src and dst objects identical
2019-05-28 10:16:05 DEBUG : CentOS-7-x86_64-Minimal-1810.iso: Unchanged skipping
2019-05-28 10:16:05 INFO : S3 bucket mig-test-rubrik-0: Waiting for transfers to finish
2019/05/28 10:16:05 DEBUG : 6 go routines active
2019/05/28 10:16:05 DEBUG : rclone: Version "v1.47.0" finishing with parameters ["rclone" "sync" "cloudianlab:test-rubrik-0" "cloudianlab:mig-test-rubrik-0" "-P" "--delete-during" "--no-update-modtime" "--transfers" "32" "-c" "-vv"]
One way to test the compatibility would be to run the rclone test suite against it. You'd need to install go, download the rclone source then cd backend/s3 then go test -v -remote cloudianlab: for the basic integration tests. This will create and destroy a few randomly named buckets, eg rclone-test-boceqog0nigayij0qoyoget1.
I had a quick go at this here - let me know what you think
I think it could be generalised a bit more for copy from gcs -> s3 for instance it only works for s3 -> s3 at the moment.
The original was uploaded as a multipart upload so doesn't have a regular MD5SUM (see the -115 on the end). rclone puts the MD5SUM as metadata, but I guess it wasn't uploaded by rclone, hence the message.
I've tested the fix you added to copy metadata between s3 targets and it seems to work.
Metadata is transferred between both and from my initial testing a resync seems to preserve metadata as well.
EDIT: Everything that is a multi-part copy doesn't sync metadata no matter if I use the --no-update-modtime or not
EDIT2: That explains the problems with aws s3 cli as well, metadata copy doesn't work on multipart copies, even server side. I had a windows tool S3Browser which did copy the metadata although it was multipart as well.
Didn't get any feedback yet from Cloudian but will let you know as soon as I get some info.
So I think this is a bug as well in Cloudian since smaller files to and from AWS works.
For the other bug they are raising an internal ticket to check this out, I'll add this to the list as well.