Anyone know how reliable rclone's size compare is? I'm in the middle of transferring files from onedrive to google drive hosted crypt remote, and onedrive is famously unreliable (random disconnections, etc). But unfortunately this is a crypt drive, which I found out doesn't support checksum checks so it is only using "size compare" to see if it needs to redownload a file.
How reliable is this? Does rclone "reserve" the entire file size prior to download so if a file is interrupted mid-download, it's possible that the full sized file exists on the crypt remote but is actually incomplete? there is a separate "cryptcheck" function in rclone but that involves downloading the file in question from my remote crypt drive and create a hash to compare against the remote source which is obviously very very slow and not practical for large data sources.
So just to understand this properly, if I'm going from Remote1: to Remotecrypt:, it doesn't download the full file locally first to run a hash and compare to the remote?
[gdrive]
type = drive
[onedrive]
type = onedrive
[gdrive-crypt]
type = crypt
[onedrive-crypt]
type = crypt
[hasher]
type = hasher
remote = gdrive:
hashsums = quickxor
max_age = off
I'm a bit confused here in terms of how to use hasher.
So
I assume hasher creates a table of hashes of each file? Where is this hash table stored?
is the example you gave showing the opposite of what I'm trying to accomplish? So the source for me is a combination of files in Onedrive + Gdrive1, and destination is GDrive2 (crypted remote). So wouldn't I "create" the hashes first of the crypted remote by running:
rclone copy onedrive: hasher:
rclone copy gdrive1: hasher: (is this step necessary since google drive already generates hashes and saves it in metadata)?
and then run:
rclone rclone check gdrive2: hasher
So then it would compare the hashes generated in step 1 vs the files transferred and located in the crypt drive, gdrive2?
The sources do not use crypt (both one drive + gdrive1). the destination uses a crypt remote hosted on google drive (call this gdrive2).
Basically a lot of the transfers from onedrive to gdrive2 have stalled/errored out/interrupted so I am highly concerned about file integrity. What's the fastest way to check hashes in gdrive1 and onedrive vs gdrive2?
Are these the commands that generate the hash? And does it require it to download the entire file first before generating the hash? For example, in the case of gdrive1, google drive already generates SHA1, SHA256, and MD5 hashes which is in the metadata. Is there a way I can just "reuse that"?
Also for the destination drive, because it is a crypt remote, the files don't have hashes generated. It seems like you are missing the command to generate the hash for the crypt remote so that in the future if I want to run hash checks I dont have to redownload files from the crypt remote to calculate a new hash again and it can just use the existing hashes?
I guess what I would've thought is there should be some command to reuse the SHA1/SHA256/MD5 hashes already generated on gdrive1, and ONLY calculate the hashes on gdrive2 (the crypt remote) so you can compare those two.
Yes all file has to be downloaded but it is part of copy operation anyway. So database is build when you run your migration. Hashes are generated on successful transfer only. And yes later they will be used to compare content with e.g. gdrive1 where you have already md5.
You do not have any hashes for gdrive2-crypt as it is crypt. Hasher bridges this gap.
This is why:
[hasher]
type = hasher
remote = gdrive2-crypt:
hashsums = quickxor, md5
max_age = off
hmm, what if I've already downloaded many of the files because I've been running the transfer? Any way to avoid the copy command and just generate hashes that already exist on the remote crypt?
Basically the goal here is to calculate hashes (SHA1) on the remote crypt, and compare those values vs the existing hashes on the google drive that is NOT encrypted.
this does not seem to be working properly when I run it. I tried running this command:
rclone check GDRIVE1: hasher: -vP
[hasher]
type = hasher
remote = GDRIVECRYPT:
hashsums = sha1
max_age = off
auto_size = 1P
It seems to be downloading the source file when it can easily just use the hashes already calculated by google, and also furthermore it seems to ignore hashsums = sha1 because I get this:
2024/01/25 13:29:28 INFO : Hasher is EXPERIMENTAL!
2024-01-25 13:29:29 INFO : Using md5 for hash comparisons
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Checks: 1 / 9, 11%
Elapsed time: 4m13.9s
Checking:
I made mistake (probably wrong copy/paste) in my examples and you followed without checking:)
it should be:
hashes = sha1
Then all works. I just tested with crypt remote.
If something does not work create test folder with one file, then run your command with -vv flag and post its output here + relevant remotes configuration settings as it is impossible to guess all details when you only post: