Checking file integrity after data copy from box to drive or viceversa

Alternatives for checksum between box and drive, as box does not support md5sum

rclone --version
rclone v1.55.0
- os/type: darwin
- os/arch: amd64
- go/version: go1.16.2
- go/linking: dynamic
- go/tags: none

OS: Mac

Storage: Box, Google Drive

% rclone md5sum box:pycharm-community-2020.3.3.dmg
                     UNSUPPORTED  pycharm-community-2020.3.3.dmg
2021/04/28 11:30:39 ERROR : pycharm-community-2020.3.3.dmg: Hash unsupported: hash type not supported
% rclone md5sum drive:/test/pycharm-community-2020.3.3.dmg
72a2be9948a410b68aead2e992d430a9  pycharm-community-2020.3.3.dmg

Verbose log:

% rclone md5sum box:pycharm-community-2020.3.3.dmg -vv
2021/05/05 18:10:23 DEBUG : Using config file from "/Users/teja/.config/rclone/rclone.conf"
2021/05/05 18:10:23 DEBUG : rclone: Version "v1.55.0" starting with parameters ["rclone" "md5sum" "box:pycharm-community-2020.3.3.dmg" "-vv"]
2021/05/05 18:10:23 DEBUG : Creating backend with remote "box:pycharm-community-2020.3.3.dmg"
                     UNSUPPORTED  pycharm-community-2020.3.3.dmg
2021/05/05 18:10:25 ERROR : pycharm-community-2020.3.3.dmg: Hash unsupported: hash type not supported
2021/05/05 18:10:25 DEBUG : 6 go routines active
2021/05/05 18:10:25 Failed to md5sum with 2 errors: last error was: Hash unsupported: hash type not supported

Context:
Hi Team,
im using rclone to copy data between google drive and box. after copying i want to verify the checksum. but drive support md5 but not box. Is there a way i can verify it

likewise box supports sha1 but not gdrive

% rclone sha1sum box:pycharm-community-2020.3.3.dmg    
edf4cb81a8cf9a29be49596e456cffd3df3418cd  pycharm-community-2020.3.3.dmg
% rclone sha1sum drive:/test/pycharm-community-2020.3.3.dmg
                             UNSUPPORTED  pycharm-community-2020.3.3.dmg
2021/05/05 18:12:56 ERROR : pycharm-community-2020.3.3.dmg: Hash unsupported: hash type not supported
2021/05/05 18:12:56 Failed to sha1sum with 2 errors: last error was: Hash unsupported: hash type not supported

hello and welcome to the forum,

try
rclone check box:pycharm-community-2020.3.3.dmg drive:test/pycharm --download -vv

as a side note: i use pycharm, great software.

Hi, as you suggested i tried with and without download option

% rclone check box:pycharm-community-2020.3.3.dmg drive:/test/ --download -vv 
2021/05/05 20:03:23 DEBUG : Using config file from "/Users/teja/.config/rclone/rclone.conf"
2021/05/05 20:03:23 DEBUG : rclone: Version "v1.55.0" starting with parameters ["rclone" "check" "box:pycharm-community-2020.3.3.dmg" "drive:/test/" "--download" "-vv"]
2021/05/05 20:03:23 DEBUG : Creating backend with remote "box:pycharm-community-2020.3.3.dmg"
2021/05/05 20:03:25 DEBUG : Creating backend with remote "drive:/test/"
2021/05/05 20:03:26 DEBUG : Google drive root 'test': root_folder_id = "0AOnEoezqAr5HUk9PVA" - save this in the config to speed up startup
2021/05/05 20:03:26 DEBUG : fs cache: renaming cache item "drive:/test/" to be canonical "drive:test"
2021/05/05 20:03:26 DEBUG : Google drive root 'test': Waiting for checks to finish
2021/05/05 20:03:27 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2021/05/05 20:03:27 DEBUG : pacer: Rate limited, increasing sleep to 1.430861643s
2021/05/05 20:03:27 DEBUG : pacer: low level retry 2/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2021/05/05 20:03:27 DEBUG : pacer: Rate limited, increasing sleep to 2.760290608s
2021/05/05 20:03:28 DEBUG : pacer: low level retry 3/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2021/05/05 20:03:28 DEBUG : pacer: Rate limited, increasing sleep to 4.493945342s
2021/05/05 20:03:32 DEBUG : pacer: Reducing sleep to 0s
2021/05/05 20:04:26 INFO  : 
Transferred:   	  195.996M / 551.543 MBytes, 36%, 3.619 MBytes/s, ETA 1m38s
Checks:                 0 / 1, 0%
Transferred:            0 / 1, 0%
Elapsed time:       1m3.0s
Checking:

Transferring:
 *                pycharm-community-2020.3.3.dmg: 21% /453.543M, 1.913M/s, 3m5s

2021/05/05 20:05:26 INFO  : 
Transferred:   	  549.996M / 728.543 MBytes, 75%, 4.818 MBytes/s, ETA 37s
Checks:                 0 / 1, 0%
Transferred:            0 / 1, 0%
Elapsed time:       2m3.0s
Checking:

Transferring:
 *                pycharm-community-2020.3.3.dmg: 60% /453.543M, 3.061M/s, 58s

2021/05/05 20:06:22 DEBUG : pycharm-community-2020.3.3.dmg: OK
2021/05/05 20:06:22 NOTICE: Google drive root 'test': 0 differences found
2021/05/05 20:06:22 NOTICE: Google drive root 'test': 1 matching files
2021/05/05 20:06:22 INFO  : 
Transferred:   	  907.085M / 907.085 MBytes, 100%, 5.355 MBytes/s, ETA 0s
Checks:                 1 / 1, 100%
Transferred:            2 / 2, 100%
Elapsed time:      2m58.2s

2021/05/05 20:06:22 DEBUG : 9 go routines active

% rclone check box:pycharm-community-2020.3.3.dmg drive:/test/     
2021/05/05 20:07:45 NOTICE: Google drive root 'test': 0 differences found
2021/05/05 20:07:45 NOTICE: Google drive root 'test': 1 hashes could not be checked
2021/05/05 20:07:45 NOTICE: Google drive root 'test': 1 matching files

But in latter i get hashes could not be checked. is there a way to check without downloading the file

If the two providers do not support the same hashes, you'd have to download the file and compute a hash between the two.

That's what rclone check does with the download option.

I'm not aware of any way to convert one hash to another as I'd imagine you'd have to write some code to do that.

@sweh might have a thought :slight_smile:

the command with the --download was successful

  1. downloaded the two files
    Transferred: 2 / 2, 100%

  2. checked the file
    Checks: 1 / 1, 100%

You can't convert hashes without access to the original document.

@nnkteja: Good post and questions! I have a couple of similar situations.

@Animosity022, @sweh: Agree, it is generally impossible to convert between hashes.

I therefore propose an enhancement to rclone check to automatically do a single- or double-sided download with local hash calculation when remote hashes are incompatible/unsupported.

As an example (think Box/OneDrive compared to GoogleDrive):

rclone check SHA1remote: MD5remote:

It would conceptually be performed by:

echo “NOTICE: SHA1remote: and MD5remote: do not have a common hash”
echo “NOTICE: Downloading MD5remote: to calculate sha1hash locally."
rclone hashsum SHA-1 SHA1remote: > srcHashes
rclone hashsum SHA-1 MD5remote: --download > dstHashes
compare srcHashes to dstHashes

This would enable a hash check without having to download from both remotes and thereby save significant time and bandwidth. It also makes it easier/quicker to check against a CryptRemote.

I haven’t checked, but I guess it will be relatively easy to implement since rclone seems to have all the building blocks needed.

Is it a good idea? Is it within reach?

rclone check has the --download flag which covers the double sided flag.

I think you are proposing a --download-first flag so when you did

rclone check --dowload-first SHA1remote: MD5remote:

It would download the SHA1Remote files to calculate their MD5.

That is certainly possible.

This would be reasonably easy to add. The code is in fs/operations/check.go if you are interested in taking a look

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.