Best way to get CRC32 from remote ZIP without downloading?

Hiccup · February 17, 2023, 9:33pm

rclone v1.61.1
- os/version: linuxmint 21 (64 bit)
- os/kernel: 5.15.0-60-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

I want to read metadata (file CRC32s) from the footer of a zip file that is stored on a (Google Drive) remote. One method I had some success with was mounting the remote, then just using a normal tool (7zip, this command: 7z l -slt file.zip) on the zip. However 7-zip gets stuck on
Scanning the drive for archives: 0M Scan
for a few minutes (although not as many minutes that'd be needed to download the file), which is strange since surely only a few bytes of data need to be transferred and running this on a local file is instant.

Is this a bug, or is there some argument I need to pass to the rclone mount command, or is this just the wrong approach to reading this info from remote ZIPs?

asdffdsa · February 17, 2023, 9:36pm

hi, try --vfs-cache-mode=full

if that does not help, then
what is your command, can you run the mount command with -vv and post the top 20 lines.

Hiccup · February 17, 2023, 11:58pm

That didn't help, unfortunately.

Command: rclone mount myremote:/ temp
Output when -vv is enabled: command_line_output.txt (9.0 KB)

asdffdsa · February 18, 2023, 12:02am

sorry, i was not clear. to your mount command, add --vfs-cache-mode=full

rclone mount myremote:/ temp --vfs-cache-mode=full

and add -vv for a debug output

Hiccup · February 18, 2023, 12:26am

Okay done:
Command: rclone mount myremote:/ temp --vfs-cache-mode=full -vv
Output: command_line_output.txt (10.8 KB)

asdffdsa · February 18, 2023, 12:31am

and what? any improvement with 7z?

ncw · February 18, 2023, 10:57am

rclone mount should be able to deal with this whether you use --vfs-cache-mode full or not.

However it does depend on the behaviour of 7z and if it decides to read all the stuff out the zip file then there is not a lot rclone can do about it.

That said I made an experimental ZIP backend which you can try here:

https://beta.rclone.org/branch/zip-backend/v1.58.0-beta.5990.02faa6f05.zip-backend/

You use it like this on a zip archive at drive:rclones.zip

$ rclone hashsum crc32 :zip:drive:rclones.zip
72d01619  rclone.4f0ddb60e.exe
2480fc47  rclone.6da352249.exe
4a187ffd  rclone.700ca23a7.exe
5f0c6c9f  rclone.9dbed0232.exe
833fe4a1  rclone.a571c1fb4.exe
64484f8e  rclone.a9d3283d9.exe
11058b93  rclone.b1d43f8d4.exe
74ace71f  rclone.b2388f129.exe
6052649a  rclone.cc8dde402.exe
d81fe0e1  rclone.e3d44612c.exe
cb384c66  rclone.ec117593f.exe
f12ce98d  rclone.01340acad.exe
b029ad3d  rclone.2b67ad17a.exe
1ff49741  rclone.4829527da.exe

Hiccup · February 18, 2023, 2:58pm

Ah no unfortunately not, I tried it before - when I said "That didn't help" above.

Hiccup · February 18, 2023, 3:01pm

Good point, maybe 7zip is making unnecessary reads. I'll try some other zip utilities.

Ah nice, that looks like it could be really useful. Although unless I'm missing something, linux (and windows) are missing from the builds there?

Animosity022 · February 18, 2023, 3:46pm

That means they didn't build. He normally fires and forgets as it takes time to build. He'll fix and update it.

asdffdsa · February 18, 2023, 4:47pm

yes, i agree.
tho sometimes the cache does make a difference with poor internet connect and super slow onedrive.

i have never seen that.

in the past, i have extracted files from .7z from a rclone mount.
so i tried just now to run your command, almost instant results
the .7z is 826MiB, 672 folders, 2318 files.
the file is hosted in wasabi, s3 rclone known for hot storage

so could be that:
--- your internet connection, is slow, high latency
--- onedrive is always very slow, including api calls.
--- how the .7z was compressed.

Hiccup · February 20, 2023, 4:48pm

I tried the zip-backend branch, I also tried using mount with a simple python script that just gets the filenames and crc32s from the ZIP, but with both, there was still a long delay with both...
zipcrcget.py.txt (464 Bytes)

asdffdsa · February 20, 2023, 6:43pm

can you post the details of a speedtest?

Hiccup · February 20, 2023, 7:04pm

Download: ~55 Mbps
Upload: ~20 Mbps
Idle ping: 32ms
Download ping: 36ms
Upload ping: 20ms

Keep in mind this is Google Drive, not OneDrive.

asdffdsa · February 20, 2023, 7:16pm

thanks, i will do that

Hiccup · February 21, 2023, 12:52am

Although there could be an issue with the python zipfile library...

Hiccup · February 21, 2023, 1:47pm

Seems like using zipfile -l (which only lists filenames and sizes) on a mount is instant, but using zipfile -v (which lists more info, such as CRC32) has the same delay as the other methods.

Hiccup · February 21, 2023, 3:09pm

I found a solution: using rclone serve http and the python RemoteZip library (example python script).