Beta testers wanted for new experimental compression remote

Over the last year @id01 and I have worked on a new overlay remote that implements transparent compression.

We now have a first beta available here
The remote implements 3 compression options lz4, gzip and xz for a fast, balanced and strong compression option.

If you are interested please go ahead and test it. You can report any issues here or in the relevant pull request on github.

5 Likes

Any more info on how to test this/what exactly is compressed, so I can test?

is there any documenation?

image

1 Like
  1. it looks like there are two files created in the remote for each file in the source, is that correct?
    and if so, why is there the need for that second file.

  2. the filename of the source file is mangled, instead of simply adding a file extension,

i would have expected only one file named dump1.bin.gz

image

2 Likes

on windows,
i tried to rclone mount to access the compressed remote.
rclone.exe mount presstest01: p:

my system locked up so bad, i could not even run task manager, via ctrl+alt+delete.
and my taskbar crashed as well.
luckily i had task scheduler open, and was able to end the task.

using lastest winfsp, v1.6

also, when the mount does work, i cannot copy files, getting errors like

image

image

2020/03/24 11:41:30 NOTICE: S3 bucket presstest path 01: Streaming uploads using chunk size 5M will have maximum file size of 48.828G
2020/03/24 11:41:30 DEBUG : 50files/dump1.binffffffffffffffff.gz: multipart upload starting chunk 1 size 1.000M offset 0/off
2020/03/24 11:41:30 DEBUG : 50files/dump1.binffffffffffffffff.gz: Size and modification time the same (differ by 0s, within tolerance 1ns)
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x28 pc=0xe86fa5]

goroutine 118 [running]:
github.com/rclone/rclone/backend/press.(*Object).ModTime(0xc0000ce550, 0x1afdb80, 0xc000040130, 0x1, 0x721a7cb79, 0x25bc020)
:1 +0x35
github.com/rclone/rclone/fs/operations.equal(0x1afdb80, 0xc000040130, 0x1b09b40, 0xc0001e6500, 0x1b115a0, 0xc0000ce550, 0x10000, 0xc000534c60)
d:/a/rclone/src/github.com/rclone/rclone/fs/operations/operations.go:191 +0xc63
github.com/rclone/rclone/fs/operations.Equal(...)
d:/a/rclone/src/github.com/rclone/rclone/fs/operations/operations.go:120
github.com/rclone/rclone/fs/operations.Rcat.func2(0x1b115a0, 0xc0000ce550, 0x29e34a90, 0xc0000059c0)
d:/a/rclone/src/github.com/rclone/rclone/fs/operations/operations.go:1340 +0x1ba
github.com/rclone/rclone/fs/operations.Rcat(0x1afdb80, 0xc000040130, 0x1b11520, 0xc000270380, 0xc000527e60, 0x11, 0x1aeb940, 0xc0001860b8, 0xbf96a86a96f1b330, 0x721a7cb79, ...)
d:/a/rclone/src/github.com/rclone/rclone/fs/operations/operations.go:1391 +0xb14
github.com/rclone/rclone/vfs.(*WriteFileHandle).openPending.func1(0xc000096980, 0xc0001860b8)
d:/a/rclone/src/github.com/rclone/rclone/vfs/write.go:72 +0xdc
created by github.com/rclone/rclone/vfs.(*WriteFileHandle).openPending
d:/a/rclone/src/github.com/rclone/rclone/vfs/write.go:70 +0x182

Very nice! I've been keeping a close eye on this in github.
Not sure when I will realistically have time to test it out thoroughly as I have my hands full atm - but it's definitely going on my list.

Size-info is still stored in the name though right? Or did you manage to find a better solution for this?

i think that one big problem with the compressed remote is that the .gz file must contain the original filename.
if a file named dump1.bin is compressed into a .gz, the file inside the .gz much not change.
as it now, the filename is changed to
dump1.bin0000100000000000

so as it is now, i must use rclone to uncompress that file,

so i am not able to use 7zip to recover that file

According to what I read in the dev-thread

I think the files should be cross-compatible.
But the current implementation apparently has to rely on storing the (real) filesize in the name.
So I think the only "damage" here is the name gets a little messed up at the end.

I don't love this solution either, but when we don't really have a reliable structure to store metadata like this centrally (which would be a large and difficult undertaking) you have to choose between some imperfect solutions. You could save this data in an accompanying file instead - but then you'd suddenly double the amount of files which would also not be ideal...

It's a hard problem is what I'm getting at, not a just a silly design flaw.

yeah, experimental.

i love rclone, have donated to rclone with time in the forum and money from my wallet, need rclone.
but at some point, trying to be everything to everybody is not productive.

in a couple of lines of batch script:

  1. compress files into password protected .7z
  2. rclone the .7z to cloud.
  3. do not need rclone to decompress .7z
1 Like

Yes, but that's not a transparent solution. making the solution is transparent is what makes this problem hard - otherwise it would be just as easy as you demonstrate. If you don't have any problem with zipping and unzipping your own stuff then this this remote isn't something you need.

Hey good to see that someone is testing this now. Maybe I should have been a little clearer this is very much still in development

The documentation hasn't been written yet I was under impression the setup is pretty straightforward if you've setup any other rclone backend before but documentation is on my todo list.

This is necessary so store metadata and allow seeking.

This looks like an actual bug I'll be looking into it. Note that using press with mount might not be optimal due to the extra overhead intruced by compressing blocking io longer.

This is still the best solution we have currently. I've been thinking about a general rclone metadata framework that stores metdata per folder but this is something that still needs discussion.

For the gzip compression this can actually be fixed the header supports storing the original filenam. It's just a limitation of our current gzip implementation I hadn't really thought about yet, same for xz actually. In the case of lz4 it's not possible to fix this.

There's is definitely some merit to that. Luckily rclone is highly modular, implementing a new backend doesn't pollute rclone's core in any way, and you only have to change a single line to remove a backend. Rclone also supports plugins now and while I'm personally not a fan of plugins in general I'd have no problem with separating certain functionality into a plugin. To me personally transparent compression aligns far more with "rsync for cloud storage" than a dlna server or the entire rest of serve tbh.

1 Like

yes, i agree on that.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.