I'm moving my data from old GSuite to IDrive. For safety, I'm running rclone cryptcheck, but I'm getting unexpected results.
To speed up things, instead of uploading from my PC that stores original files I'm using a remote machine with faster connection to download from Google Drive then copy to IDrive: rclone copy google:dir1 idrive:dir1
<5>NOTICE: Encrypted drive 'idrive:dir1': 0 differences found
<5>NOTICE: Encrypted drive 'idrive:dir1': 3 hashes could not be checked
<5>NOTICE: Encrypted drive 'idrive:dir1': 5 matching files
rclone cryptcheck google:dir1 idrive:dir1:
<5>NOTICE: Encrypted drive 'idrive:dir1': 0 differences found
<5>NOTICE: Encrypted drive 'idrive:dir1': 3 hashes could not be checked
<5>NOTICE: Encrypted drive 'idrive:dir1': 5 matching files
I should try to avoid the 3 hashes could not be checked message to be more safe of file integrity, right? If so, how?
If I upload from local rclone copy /local/dir1 idrive:dir1
I no longer receive hashes could not be checked messages.
Seems that remote → remote is unable to do things properly. This isn't server side copy, so it should be the same as downloading the file from google then uploading to idrive, right? But it's not. If I do it manually, by first running rclone copy google:dir1 /tmp/temp
then uploading with rclone copy /tmp/temp idrive:temp --no-check-dest
it's all fine when running cryptcheck
So, two questions:
Am I right wanting to avoid hashes could not be checked message for safety?
What's the proper way to copy between remotes while preserving file integrity? i.e. avoiding the message on cryptcheck results.
For 2, I guess it envolves downloading the entire file before uploading instead of doing it by chunks, so I tried --multi-thread-chunk-size 2G (files are smaller than 2GB) but results were the same.
I said that to simplify and to ease understanding, as the exact path doesn't seem to be relevant to the question.
Yes, Google normal remote is set at the root of my Google Drive, and the related Google encrypted remote (which it's what I actually use in commands) is a subfolder of a subfolder.
Sometimes it is better to post too much than too little. Maybe it is obvious for you as you are familiar with your setup but I am confused what is what:)
And now even more as rclone copy google:dir1 idrive:dir1 but in your config there is no google remote.
I do not understand why rclone cryptcheck /local/dir1 bucket:dir1 does not work. It would suggest that rclone copy gcrypt:dir1 bucket:dir1 did not create checksums on destination.
The rest I think is by design (but I am not 100% sure).. even if rclone cryptcheck gcrypt:dir1 bucket:dir1 could work in theory when we accept that all files from one remote have to be downloaded in order to calculate hashes.
And there is no issue with cryptcheck local with iDrive crypt.
$ rclone cryptcheck . iDrive-crypt:test -vv
...
2023/09/25 12:15:49 INFO : Using md5 for hash comparisons
2023/09/25 12:15:49 DEBUG : Encrypted drive 'iDrive-crypt:test': Waiting for checks to finish
2023/09/25 12:15:49 DEBUG : test.file: OK
2023/09/25 12:15:49 NOTICE: Encrypted drive 'iDrive-crypt:test': 0 differences found
2023/09/25 12:15:49 NOTICE: Encrypted drive 'iDrive-crypt:test': 1 matching files
@dansorod - Results are not like yours... I know that you use gdrive and I use onedrive but I do not see any reason why it should be different. could you create step by step example how to replicate it?
My gdrive is full, so I did a new test between idrive buckets with different encrypt passwords.
These were the steps:
Disclaimer:
I copied a folder with 3 video files ranging between 400MB and 700MB each.
Three drives envolved: /local/ is my HD, cicrypt: is a crypted bucket and migra: is another crypted bucket using different password.
rclone copy /local/dir cicrypt:dir
Result: files were copied.
rclone cryptcheck /local/dir cicrypt:dir
Result: matching files without warning about hashes, OK.
rclone copy cicrypt:dir migra:dir
Result: copied (client side, files were downloaded and uploaded by chunks).
rclone cryptcheck /local/dir migra:dir
Result: failed, "3 hashes could not be checked".
rclone cryptcheck cicrypt:dir migra:dir
Result: failed, "3 hashes could not be checked".
rclone copy /local/dir migra:dir
Result: nothing was done, rclone treats files as identical.
rclone copy /local/dir migra:dir --no-check-dest
Files were forcibly copied again (overwritten).
rclone cryptcheck /local/dir migra:dir (same as step 4)
Result: matching files without warning about hashes, OK. (different result compared to 4)
rclone cryptcheck cicrypt:dir migra:dir (same as step 5)
Result: matching files without warning about hashes, OK. (different result compared to 5)
So that's what I'm saying. When files are uploaded between crypted remotes with different passwords, cryptcheck doesn't return full pass, there's a message saying "hashes could not be checked". In order to prevent this message I must upload from my HD.
But copying between crypted remotes with different passwords is a client side operation too, so results should be the same.
There should be a way to copy between different crypted remotes and be able to fully pass cryptcheck test.
It shows that indeed that copying data between remotes strip them of hashes sometimes (even when both remotes support hashes). My tests do not show it so it is something remote specific.
@ncw could you please have a look at this? I am not sure what can cause it.
Maybe you tested with files too smal? My guessing is this issue is related to chunks. When you copy from HD to remote, rclone has instant access to the entire file. On the other hand, when you copy remote to remote (but client-side) rclone downloads a chunk and uploads it, downloads another chunk and uploads it... This difference can affect the approach used by rclone to check file integrity.
The problem happens because the crypt backend can't provide the md5sum of the decrypted file and it doesn't provide the md5sum of the encrypted file because that isn't what the md5 sum should be.
So the large files (bigger than --s3-upload-cutoff) get uploaded without md5 sums.
Even if you could copy the md5sums from source to dest they would be wrong here as you've used a different password on the destination.
The s3 backend could calculate the md5sum on the fly and add it to the file at the end of the copy, but we don't do this because changing metadata involves an expensive API operation.
Its not that expensive though (same as PUT, LIST) so maybe we should, or maybe it should be an option.
Thanks for replying. So I guess the current workaround would be using this option with a value bigger than filesize, right?
Seems OK for me, but I need to move some files that are way bigger than my RAM. Is it possible to config rclone for, when doing this client-side remote→remote copy, to store chunks (that would actually be the whole files) in disk instead of RAM when the sum of parallel downloading chunks exceeds some size?
Its not that expensive though (same as PUT, LIST) so maybe we should, or maybe it should be an option.
--s3-upload-cutoff 5G seems to work for rclone copy cryptA: cryptB: in a way that cryptcheck returns the expected output. But, as expected, only for files up to 5GB.
For files >5GB, it still fails. But downloading the entire file before uploading works. Like running: rclone copy cryptA:file /local/
followed by rclone copy /local/file cryptB:
But it doesn't feel right needing to do it in two steps, also I need to manually delete the temporary downloaded file at the end.
Is there an option to do rclone copy remote1: remote2: forcing to download the entire file before upload, so that it can send md5sum? This would do it for now, while there's no way to "calculate md5sum on the fly and add it to the file at the end of the copy" that would allow proper parallel chunk download/upload for remote1→remote2 copy.
Edit: I just tried with --streaming-upload-cutoff 99G but it didn't work, still started uploading right after it started the download, instead of waiting to finish downloading the entire file which is what I need.