Best way to get CRC32 from remote ZIP without downloading?

rclone v1.61.1
- os/version: linuxmint 21 (64 bit)
- os/kernel: 5.15.0-60-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

I want to read metadata (file CRC32s) from the footer of a zip file that is stored on a (Google Drive) remote. One method I had some success with was mounting the remote, then just using a normal tool (7zip, this command: 7z l -slt file.zip) on the zip. However 7-zip gets stuck on
Scanning the drive for archives: 0M Scan
for a few minutes (although not as many minutes that'd be needed to download the file), which is strange since surely only a few bytes of data need to be transferred and running this on a local file is instant.

Is this a bug, or is there some argument I need to pass to the rclone mount command, or is this just the wrong approach to reading this info from remote ZIPs?

hi, try --vfs-cache-mode=full

if that does not help, then
what is your command, can you run the mount command with -vv and post the top 20 lines.

That didn't help, unfortunately.

Command: rclone mount myremote:/ temp
Output when -vv is enabled: command_line_output.txt (9.0 KB)

sorry, i was not clear. to your mount command, add --vfs-cache-mode=full

rclone mount myremote:/ temp --vfs-cache-mode=full

and add -vv for a debug output

Okay done:
Command: rclone mount myremote:/ temp --vfs-cache-mode=full -vv
Output: command_line_output.txt (10.8 KB)

and what? any improvement with 7z?

rclone mount should be able to deal with this whether you use --vfs-cache-mode full or not.

However it does depend on the behaviour of 7z and if it decides to read all the stuff out the zip file then there is not a lot rclone can do about it.

That said I made an experimental ZIP backend which you can try here:

https://beta.rclone.org/branch/zip-backend/v1.58.0-beta.5990.02faa6f05.zip-backend/

You use it like this on a zip archive at drive:rclones.zip

$ rclone hashsum crc32 :zip:drive:rclones.zip
72d01619  rclone.4f0ddb60e.exe
2480fc47  rclone.6da352249.exe
4a187ffd  rclone.700ca23a7.exe
5f0c6c9f  rclone.9dbed0232.exe
833fe4a1  rclone.a571c1fb4.exe
64484f8e  rclone.a9d3283d9.exe
11058b93  rclone.b1d43f8d4.exe
74ace71f  rclone.b2388f129.exe
6052649a  rclone.cc8dde402.exe
d81fe0e1  rclone.e3d44612c.exe
cb384c66  rclone.ec117593f.exe
f12ce98d  rclone.01340acad.exe
b029ad3d  rclone.2b67ad17a.exe
1ff49741  rclone.4829527da.exe

Ah no unfortunately not, I tried it before - when I said "That didn't help" above.

Good point, maybe 7zip is making unnecessary reads. I'll try some other zip utilities.

Ah nice, that looks like it could be really useful. Although unless I'm missing something, linux (and windows) are missing from the builds there?

That means they didn't build. He normally fires and forgets as it takes time to build. He'll fix and update it.

1 Like

yes, i agree.
tho sometimes the cache does make a difference with poor internet connect and super slow onedrive.

i have never seen that.

in the past, i have extracted files from .7z from a rclone mount.
so i tried just now to run your command, almost instant results
the .7z is 826MiB, 672 folders, 2318 files.
the file is hosted in wasabi, s3 rclone known for hot storage

so could be that:
--- your internet connection, is slow, high latency
--- onedrive is always very slow, including api calls.
--- how the .7z was compressed.

I tried the zip-backend branch, I also tried using mount with a simple python script that just gets the filenames and crc32s from the ZIP, but with both, there was still a long delay with both...
zipcrcget.py.txt (464 Bytes)

can you post the details of a speedtest?

Download: ~55 Mbps
Upload: ~20 Mbps
Idle ping: 32ms
Download ping: 36ms
Upload ping: 20ms

Keep in mind this is Google Drive, not OneDrive.

thanks, i will do that :wink:

Although there could be an issue with the python zipfile library...

Seems like using zipfile -l (which only lists filenames and sizes) on a mount is instant, but using zipfile -v (which lists more info, such as CRC32) has the same delay as the other methods.

I found a solution: using rclone serve http and the python RemoteZip library (example python script).