Compress findings

Hello,

i'm doing some local tests of the compress remote before implementing it on backups to cloud, the idea is to ultimately use it to reduce total size and also reduce sync times due to slow upload speed.

Trying with different datasets i consistently get better compression ratios with gzip compared to zstd so i'm wondering if there's something i'm missing/some other parameter that can be tweaked other than compression level and method as usually zstd provides better compression

Everything is run on a 14900KS with 128GB of DDR4

Silesia corpus - 202 MB (211.938.580 byte) :

zstd level 4

Transferred:       70.687 MiB / 70.687 MiB, 100%, 0 B/s, ETA -
Checks:                 0 / 0, -, Listed 12
Transferred:           12 / 12, 100%
Elapsed time:         3.7s


zstd level 3

Transferred:       73.478 MiB / 73.478 MiB, 100%, 0 B/s, ETA -
Checks:                 0 / 0, -, Listed 12
Transferred:           12 / 12, 100%
Elapsed time:         1.1s


zstd level 2

Transferred:       75.437 MiB / 75.437 MiB, 100%, 0 B/s, ETA -
Checks:                 0 / 0, -, Listed 12
Transferred:           12 / 12, 100%
Elapsed time:         0.7s


zstd level 1

Transferred:       79.056 MiB / 79.056 MiB, 100%, 0 B/s, ETA -
Checks:                 0 / 0, -, Listed 12
Transferred:           12 / 12, 100%
Elapsed time:         0.4s


gzip level 9

Transferred:       65.153 MiB / 65.153 MiB, 100%, 0 B/s, ETA -
Checks:                 0 / 0, -, Listed 12
Transferred:           12 / 12, 100%
Elapsed time:         1.8s


Farm Sim 25 - 35,8 GB (38.481.626.640 byte) - 37502 files

zstd level 4 - 23,1 GB (24.811.289.669 byte)
zstd level 3 - 23,6 GB (25.440.897.756 byte)
zstd level 2 - 24,0 GB (25.783.671.718 byte)
zstd level 1 - 25,0 GB (26.928.731.465 byte)
gzip level 9 - 22,3 GB (24.009.886.664 byte)

Appdata folder - 66,8 GB (71.833.157.577 byte) - 172805 files

zstd level 4 - 44,5 GB (47.809.906.290 byte)
gzip level 9 - 43,9 GB (47.149.960.138 byte)

Downloads folder - 361 GB (387.742.227.734 byte)

(sizes reported by Filen, could be slightly incorrect)

zstd level 4 - 301,53 GB
gzip level 9 - 294,93 GB

Run the command 'rclone version' and share the full output of the command.

rclone v1.74.2

  • os/version: Microsoft Windows 11 IoT Enterprise LTSC 2024 24H2 24H2 (64 bit)
  • os/kernel: 10.0.26100.8524 (x86_64)
  • os/type: windows
  • os/arch: amd64
  • go/version: go1.26.3
  • go/linking: static
  • go/tags: cmount

The command you were trying to run (eg rclone copy /tmp remote:tmp)

.\rclone.exe copy "D:\Downloads\silesia\" "compresslocalgzip:silesia" -v --stats=1s -P

(replace gzip with zstd1..4)

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[compresslocalgzip]
type = compress
ram_cache_limit = 100Mi
remote = E:\test\rclone\compress\gzip
level = 9

[compresslocalzstd4]
type = compress
remote = E:\test\rclone\compress\zstd4
mode = zstd
level = 4
ram_cache_limit = 100Mi

[compresslocalzstd1]
type = compress
remote = E:\test\rclone\compress\zstd1
mode = zstd
level = 1
ram_cache_limit = 100Mi

[compresslocalzstd2]
type = compress
remote = E:\test\rclone\compress\zstd2
mode = zstd
level = 2
ram_cache_limit = 100Mi

[compresslocalzstd3]
type = compress
remote = E:\test\rclone\compress\zstd3
mode = zstd
level = 3
ram_cache_limit = 100Mi

As you can see gzip -9 is consistenly better in terms of compression ratio and from what i saw it's also lighter on cpu vs zstd -4 which seems strange given the output

I would like some feedback on the results if someone is also using compress, trying to understand if these results are as expected or if i'm doing something wrong.

Thank you

Quick research using AI tells me that you would need zstd -11 to get compression ratio match to gzip -9.

It means that rclone zstd level 4 is something else. See rclone zstd implementation author explanation of rclone levels:

As it is implemented right now if your objective is max compression level then gzip level 9 is the best option indeed.

I guess it would be nice to understand how zstd 1..4 in rclone relate to zstd's 1..22 levels, I guess is some intermediate steps in between but rclone zstd 4 isn't the same as zstd 4.

For gzip I guess it's mapped 1:1 as they both go up to level 9

I agree and I also did quick test using silesia test set.

I used stand-alone gzip and zstd programs:

$ gzip --version
Apple gzip 479

$ zstd --version
*** Zstandard CLI (64-bit) v1.5.7, by Yann Collet ***

Files were compressed individually (as rclone compress does) using:

# for zstd:
$ for file in *; do [ -f "$file" ] && zstd -3 --rm "$file"; done

# for gzip
$ for file in *; do [ -f "$file" ] && gzip -9 "$file"; done

and indeed already for zstd -3 compression level is better than gzip -9.

$ du -sh *
202M	silesia
 65M	silesia-gzip9
 70M	silesia-zstd1
 66M	silesia-zstd2
 63M	silesia-zstd3
 62M	silesia-zstd4

Something is maybe not right (?) with zstd levels mapping in rclone compress. Let's see what this implementation author thinks:

I suspect this is the underlying library used:

and zstd level mappings are 1:1 to what is used there. And by the compression ratios, it looks like the 1-4 levels are regular zstd 1-4 levels.