Compress remote is creating .bin and .json files, not .gzip files

What is the problem you are having with rclone?

When I create a compress-remote and specify an existing remote to compress during the config/creation process, subsequent attempts to copy or sync files to that compressed-remote result in a .bin and a .json file being created in the compress-remote for each source file, rather than a .gzip file as per the documentation. The .bin file is exactly the same size as the original file instead of being smaller, leading me to believe that compression is not functioning correctly.

For example, "largefile.pdf" which is 500KB, when copied to the compress-remote becomes "largefile.pdf.bin" which is still 500KB, and "largefile.pdf.json" which is somewhat smaller.

Run the command 'rclone version' and share the full output of the command.

rclone v1.62.2

  • os/version: arch 21.0.1 (64 bit)
  • os/kernel: 5.4.195-1-MANJARO (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.20.4
  • go/linking: dynamic
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

For this test, a local/alias type remote on my local filesystem.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy ~/Documents/largefile.pdf compessedremote1:

The rclone config contents with secrets removed.

[localremote1]
type = alias
remote = /home/manjaro/LRemote

[compressedremote1]
type = compress
remote = localremote1:

A log from the command with the -vv flag

2023/06/21 15:12:47 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "-vv" "/home/manjaro/Documents/largefile.pdf" "compressedremote1:"]
2023/06/21 15:12:47 DEBUG : Creating backend with remote "/home/manjaro/Documents/largefile.pdf"
2023/06/21 15:12:47 DEBUG : Using config file from "/home/manjaro/.config/rclone/rclone.conf"
2023/06/21 15:12:47 DEBUG : fs cache: adding new entry for parent of "/home/manjaro/Documents/largefile.pdf", "/home/manjaro/Documents"
2023/06/21 15:12:47 DEBUG : Creating backend with remote "compressedremote1:"
2023/06/21 15:12:47 DEBUG : Creating backend with remote "/home/manjaro/LRemote/.json"
2023/06/21 15:12:47 DEBUG : Creating backend with remote "/home/manjaro/LRemote"
2023/06/21 15:12:47 DEBUG : largefile.pdf: Need to transfer - File not found at Destination
2023/06/21 15:12:47 DEBUG : largefile.pdf: md5 = 17f06d633743b8574126aceed17d3069 OK
2023/06/21 15:12:47 INFO  : largefile.pdf: Copied (new)
2023/06/21 15:12:47 INFO  : 
Transferred:   	  370.985 KiB / 370.985 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.0s

2023/06/21 15:12:47 DEBUG : 4 go routines active

You get the json file and the gz:

[felix@gemini rclone]$ rclone copy /etc/hosts testcomp: -vv
2023/06/21 15:33:56 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2023/06/21 15:33:56 DEBUG : rclone: Version "v1.63.0-beta.7013.d05393b86.fix-6986-times" starting with parameters ["rclone" "copy" "/etc/hosts" "testcomp:" "-vv"]
2023/06/21 15:33:56 DEBUG : Creating backend with remote "/etc/hosts"
2023/06/21 15:33:56 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2023/06/21 15:33:56 DEBUG : fs cache: adding new entry for parent of "/etc/hosts", "/etc"
2023/06/21 15:33:56 DEBUG : Creating backend with remote "testcomp:"
2023/06/21 15:33:56 DEBUG : Creating backend with remote "/home/felix/test/.json"
2023/06/21 15:33:56 DEBUG : Creating backend with remote "/home/felix/test"
2023/06/21 15:33:56 DEBUG : hosts: Need to transfer - File not found at Destination
2023/06/21 15:33:56 DEBUG : hosts: md5 = 9202b6e377e72181b7e878bdac616aa2 OK
2023/06/21 15:33:56 INFO  : hosts: Copied (new)
2023/06/21 15:33:56 INFO  :
Transferred:   	        211 B / 211 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.0s

2023/06/21 15:33:56 DEBUG : 4 go routines active
[felix@gemini rclone]$ cd test
-bash: cd: test: No such file or directory
[felix@gemini rclone]$ ls
logs  rclone.conf
[felix@gemini rclone]$ cd
[felix@gemini ~]$ cd test
[felix@gemini test]$ ls
hosts.json  hosts.vQEAAAAAAAA.gz
[felix@gemini test]$ cat hosts.json
{"Mode":2,"Size":445,"MD5":"9202b6e377e72181b7e878bdac616aa2","MimeType":"text/plain; charset=utf-8","CompressionMetadata":{"BlockSize":1048576,"Size":445,"BlockData":[10,193]}}[felix@gemini test]$
[felix@gemini test]$
[felix@gemini test]$ ls
hosts.json  hosts.vQEAAAAAAAA.gz
[felix@gemini test]$ ls -al
total 12
drwxrwxr-x   2 felix felix   52 Jun 21 15:33 .
drwx------. 11 felix felix 4096 Jun 21 15:33 ..
-rw-rw-r--   1 felix felix  177 May 18 12:40 hosts.json
-rw-rw-r--   1 felix felix  211 May 18 12:40 hosts.vQEAAAAAAAA.gz

The json is the metadata and intended.

Except in my case, I don't get the .gz file. I get the .json and the .bin

[manjaro@manjaro LRemote]$ ls -all
total 376
drwxr-xr-x  2 manjaro manjaro     80 Jun 21 15:36 .
drwx------ 15 manjaro manjaro    620 Jun 21 15:10 ..
-rw-r--r--  1 manjaro manjaro 379889 May 28  2020 largefile.pdf.bin
-rw-r--r--  1 manjaro manjaro    158 May 28  2020 largefile.pdf.json

Not sure why it is producing different results on each of our systems.... but open to any theories/suggestions?

Can anyone suggest what I should do differently in order to get compressed gzip files in the remote as expected, instead of uncompressed .bin files as I'm currently getting?

Is this repeatable on your system? What does the json file look like? If it is, you should open a bug report.

My test on macOS 13:

I made an easily compressible 10mb file in Python:

with open('10mb.bin','wb') as fp:
    for _ in range(1024):
        fp.write(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09'*1024)

My config

[localremote1]
type = alias
remote = /Users/jwinokur/rclone_play/dst

[compressedremote1]
type = compress
remote = localremote1:

Command and log

$ rclone copy -vv src/10mb.bin compressedremote1:

2023/06/23 10:00:55 DEBUG : Setting --config "/Users/jwinokur/rclone_play/config.cfg" from environment variable RCLONE_CONFIG="/Users/jwinokur/rclone_play/config.cfg"
2023/06/23 10:00:55 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "-vv" "src/10mb.bin" "compressedremote1:"]
2023/06/23 10:00:55 DEBUG : Creating backend with remote "src/10mb.bin"
2023/06/23 10:00:55 DEBUG : Using config file from "/Users/jwinokur/rclone_play/config.cfg"
2023/06/23 10:00:55 DEBUG : fs cache: adding new entry for parent of "src/10mb.bin", "/Users/jwinokur/rclone_play/src"
2023/06/23 10:00:55 DEBUG : Creating backend with remote "compressedremote1:"
2023/06/23 10:00:55 DEBUG : Creating backend with remote "/Users/jwinokur/rclone_play/dst/.json"
2023/06/23 10:00:55 DEBUG : Creating backend with remote "/Users/jwinokur/rclone_play/dst"
2023/06/23 10:00:55 DEBUG : 10mb.bin: Need to transfer - File not found at Destination
2023/06/23 10:00:55 DEBUG : 10mb.bin: md5 = 5209d3d299f4a7f139430822fd49e29b OK
2023/06/23 10:00:55 INFO  : 10mb.bin: Copied (new)
2023/06/23 10:00:55 INFO  :
Transferred:   	   20.747 KiB / 20.747 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.0s

2023/06/23 10:00:55 DEBUG : 5 go routines active

Results:

$ rclone lsl compressedremote1:
 10485760 2023-06-23 09:57:43.996607067 10mb.bin

$ rclone lsl localremote1:
    21245 2023-06-23 09:57:43.996607067 10mb.bin.AACgAAAAAAA.gz
      234 2023-06-23 09:57:43.996607067 10mb.bin.json

jqed (I made up the verb) JSON:

{
  "Mode": 2,
  "Size": 10485760,
  "MD5": "5209d3d299f4a7f139430822fd49e29b",
  "MimeType": "application/octet-stream",
  "CompressionMetadata": {
    "BlockSize": 1048576,
    "Size": 10485760,
    "BlockData": [
      10,
      2122,
      2122,
      2122,
      2122,
      2122,
      2122,
      2122,
      2122,
      2122,
      2122,
      7
    ]
  }
}

Rclone only compresses compressible files. Files not compressible due to being already compressed (like PDFs) are just passed as is with .bin extension. You can test this by creating a compressible file (I used /dev/zero and rclone will compress it
rclone compression

This is good to know as there is no single word about it in docs. It is still marked as experimental though.

Interesting!

Looking at the source code, it even seems like it isn't based on extension. It is based on a real test. It will try to compress it and if it doesn't get smaller by 10%, it will leave it as is. That is good to know!

I added it to the docs: docs/compress: Add note about files which grow in compression by Jwink3101 · Pull Request #7079 · rclone/rclone · GitHub

1 Like

It is actually quite useful remote - I will have a look how difficult it would be to plug in zstd algo - gzip is ancient and very slow on modern computers.

Thanks for the responses everyone. I did the test with a 10MB /dev/zero file and rclone did indeed compress it into a (tiny) gzip file as described above. A few notes/observations based on all of this:

  1. I had previously in fact tried with lots of different file types and formats including larger text files, and none of them were compressed, leading me to think there was a bug. So far an empty zero-file is the only thing I've gotten to compress successfully.

  2. When I tried compressing some of those previous test files (like PDfs and TXts) using Gzip directly, they often did compress by more than 10% as a direct result. Not sure if Gzip's default compression level is higher than the setting Rclone uses for it, but that difference is noteworthy.

  3. The 10% minimum shrink-threshold creates an interesting situation, where a compressed remote could actually end up taking more disk space than the original uncompressed source files. For example if you have a folder with 1000 text files, none of which will compress very much, the addition of 1000 new .json files all around 200 bytes each (which are added to the compress remote regardless of whether the corresponding source files get compressed or not) will result in a net increase in size of 200KB to the compress remote over the original source folder.

  4. I wonder how the 10% figure was decided on. In contrast to the above situation, if I have a 1GB file which can be compressed by 8%, that is still 80MB saved which would be a worthy amount to reclaim on a remote filesystem where storage space is limited.

I added it to the docs: docs/compress: Add note about files which grow in compression by Jwink3101 · Pull Request #7079 · rclone/rclone · GitHub
"Add note about how compress will keep the uncompressed file if it grows by more than 10%"

I think the wording of that description is slightly inaccurate? If I understand correctly, it isn't a question of whether the file grows by more than 10%, but rather of if it fails to shrink by at least 10%. I could be mistaken though.

yeah it needs some polishing

Edit

See original below. Looking again, I am not sure. I am very, very, new to golang and it can be hard to follow the short variable names. You very well may be right.

Original

That is what I would have expected too but it is actually in the code as:

    ratio := float64(n) / float64(b.Len())
    return ratio > minCompressionRatio, nil

where minCompressionRatio = 1.1

Personally, I do not understand why this isn't 1.0. If it file grows at all, why waste the effort to later decompress it?

Indeed, it's difficult to tell for sure without knowing which of those short variable names (n and b) refer to the original and compressed version of the file. From my admittedly imperfect knowledge of Gzip, it's hard to imagine it ever actually increasing the size of a file as a result of compression (not counting .json metadata files added by rclone only) , so that 1.1 figure seems meaningless unless it's really a shrinkage ratio, but who knows. Maybe there truly are some edge cases where Gzip will grow a file in size, depending on its unusual byte composition.

Yea. I tried to.set.it up.like
Rclone mycloud: compress:
And yes i get .bin and json files
I guess i will just do it the other way after i dump my cloud
With
find -type f -exec 7z a {}.zip ; rm {} ;
Yea.....
After it is all done...

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.