How reliable is rclone's size compare function?

Anyone know how reliable rclone's size compare is? I'm in the middle of transferring files from onedrive to google drive hosted crypt remote, and onedrive is famously unreliable (random disconnections, etc). But unfortunately this is a crypt drive, which I found out doesn't support checksum checks so it is only using "size compare" to see if it needs to redownload a file.

How reliable is this? Does rclone "reserve" the entire file size prior to download so if a file is interrupted mid-download, it's possible that the full sized file exists on the crypt remote but is actually incomplete? there is a separate "cryptcheck" function in rclone but that involves downloading the file in question from my remote crypt drive and create a hash to compare against the remote source which is obviously very very slow and not practical for large data sources.

Why not to transfer crypt data directly?

And on gdrive add hasher overlay with quickxor hash so later you can verify all transfers.

No. Modern rclone's download files to a .partial file for precisely this reason and rename it when it is verified to be correct.

Cryptcheck only downloads the header from each file which is 32 bytes. This can still be slow but it isn't the whole file!

Cryptcheck uses this header to encrypt your local file and hash it on the fly - then it can check this hash with the one the provider has.

So just to understand this properly, if I'm going from Remote1: to Remotecrypt:, it doesn't download the full file locally first to run a hash and compare to the remote?

I was reading the docs and it says this:

rclone cryptcheck remote:path encryptedremote:path

You can use it like this also, but that will involve downloading all the files in remote:path.

This is onedrive -> gdrive transfer. No .partial temp file will be used (it is only for local and sftp AFAIK)

Similarly here. It is true for remote <-> local. But not for remote <-> remote. Isn't it?

What I would do:

[gdrive]
type = drive

[onedrive]
type = onedrive

[gdrive-crypt]
type = crypt

[onedrive-crypt]
type = crypt

[hasher]
type = hasher
remote = gdrive:
hashsums = quickxor
max_age = off

and then

rclone copy onedrive: hasher:

followed by

rclone check onedrive: hasher:

I'm a bit confused here in terms of how to use hasher.

So

  1. I assume hasher creates a table of hashes of each file? Where is this hash table stored?

  2. is the example you gave showing the opposite of what I'm trying to accomplish? So the source for me is a combination of files in Onedrive + Gdrive1, and destination is GDrive2 (crypted remote). So wouldn't I "create" the hashes first of the crypted remote by running:

rclone copy onedrive: hasher:
rclone copy gdrive1: hasher: (is this step necessary since google drive already generates hashes and saves it in metadata)?

and then run:
rclone rclone check gdrive2: hasher

So then it would compare the hashes generated in step 1 vs the files transferred and located in the crypt drive, gdrive2?

Locally on computer you run rclone. So maybe not very practical for long use but perfect for migration.

and now you are saying:

so I am not sure:)

Do your sources use crypt? Or only destination?

The sources do not use crypt (both one drive + gdrive1). the destination uses a crypt remote hosted on google drive (call this gdrive2).

Basically a lot of the transfers from onedrive to gdrive2 have stalled/errored out/interrupted so I am highly concerned about file integrity. What's the fastest way to check hashes in gdrive1 and onedrive vs gdrive2?

By using hasher on gdrive2-crypt I create quickxor and md5 hashes for this destination so then checking integrity is trivial as I have common hashes.

[onedrive]
type = onedrive

[gdrive1]
type = drive

[gdrive2]
type = drive

[gdrive2-crypt]
type = crypt
remote = gdrive2:

[hasher]
type = hasher
remote = gdrive2-crypt:
hashsums = quickxor, md5
max_age = off

and

rclone copy onedrive:path1 hasher:path1
rclone copy gdrive1:path2 hasher:path2

then you can

rclone check onedrive:path1 hasher:path1
rclone check gdrive1:path2 hasher:path2

as you have always commons hash between source and destination.

thanks. Just on this command here:

rclone copy onedrive:path1 hasher:path1
rclone copy gdrive1:path2 hasher:path2

Are these the commands that generate the hash? And does it require it to download the entire file first before generating the hash? For example, in the case of gdrive1, google drive already generates SHA1, SHA256, and MD5 hashes which is in the metadata. Is there a way I can just "reuse that"?

Also for the destination drive, because it is a crypt remote, the files don't have hashes generated. It seems like you are missing the command to generate the hash for the crypt remote so that in the future if I want to run hash checks I dont have to redownload files from the crypt remote to calculate a new hash again and it can just use the existing hashes?

I guess what I would've thought is there should be some command to reuse the SHA1/SHA256/MD5 hashes already generated on gdrive1, and ONLY calculate the hashes on gdrive2 (the crypt remote) so you can compare those two.

Yes all file has to be downloaded but it is part of copy operation anyway. So database is build when you run your migration. Hashes are generated on successful transfer only. And yes later they will be used to compare content with e.g. gdrive1 where you have already md5.

You do not have any hashes for gdrive2-crypt as it is crypt. Hasher bridges this gap.

This is why:

[hasher]
type = hasher
remote = gdrive2-crypt:
hashsums = quickxor, md5
max_age = off

it generates hashes for your crypt.

hmm, what if I've already downloaded many of the files because I've been running the transfer? Any way to avoid the copy command and just generate hashes that already exist on the remote crypt?

Basically the goal here is to calculate hashes (SHA1) on the remote crypt, and compare those values vs the existing hashes on the google drive that is NOT encrypted.

There is clever trick to help with it.

[hasher]
type = hasher
remote = gdrive2-crypt:
hashsums = quickxor, md5
max_age = off
auto_size = 1P

it will always update hasher database if hash is missing for any files smaller than 1 petabyte - so effectively for any file.

When you run:

rclone check onedrive:path1 hasher:path1

if some hash is missing from gdrive2-crypt: then file will be downloaded and hashes database updated

Check hasher docs for details.

Sorry,I missed the fact this was a remote -> remote transfer.

No partial, but the same guarantee - a file will need to be uploaded and checked before it appears in the remote.

Yes you have to download one side of the file to do the check in that case.

I'd probably just use rclone check --download to download both sides and have a 100% bulletproof check.

this does not seem to be working properly when I run it. I tried running this command:

rclone check GDRIVE1: hasher: -vP

[hasher]
type = hasher
remote = GDRIVECRYPT:
hashsums = sha1
max_age = off
auto_size = 1P

It seems to be downloading the source file when it can easily just use the hashes already calculated by google, and also furthermore it seems to ignore hashsums = sha1 because I get this:

2024/01/25 13:29:28 INFO  : Hasher is EXPERIMENTAL!
2024-01-25 13:29:29 INFO  : Using md5 for hash comparisons
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 9, 11%
Elapsed time:      4m13.9s
Checking:

SHA1 on gdrive is a new thing. So not all files stored there have such hash. Use MD5 which is always available.

Google only added SHA1 and SHA256 at some stage. If file was uploaded before than only MD5 is present:

$ rclone sha1sum drive:IL.xlsx
                                          IL.xlsx

$ rclone md5sum drive:IL.xlsx
739307f12458d5e85cf882644d45cb4f  IL.xlsx

Above is from my gdrive for some old file. And below for file uploaded now:

$ rclone sha1sum drive:test.jpg
2ca345189c63a2cbba0c73114f95d645c2971d22  test.jpg

$ rclone md5sum drive:test.jpg
fe2979230faaa5f157d5ae82408e0c05  test.jpg

means that your gdrive2 will have all hashes but most likely gdrive1 has only MD5 for all files.

I tried with md5 and it doesnt work. I dont think hasher support crypt remotes. This is the error I get:

: hasher::hasher:: 1 hashes could not be checked

I made mistake (probably wrong copy/paste) in my examples and you followed without checking:)

it should be:

hashes = sha1

Then all works. I just tested with crypt remote.

If something does not work create test folder with one file, then run your command with -vv flag and post its output here + relevant remotes configuration settings as it is impossible to guess all details when you only post: