Union with auto chunker

From what I understand, chunker is more suitable for providers with file size limit. But what if user wants to merge multiple remotes in a single file system, chunking only when it's needed due to one remote being full?

I should use union and chunker, I know. But I feel that using chunker as it is would be a waste most of the time, no?

Hypothetical example:

  • Three remotes with 10GB + 7GB + 3GB free.
  • I need to upload two files with 9GB each.

If you do the math, you see I must use the space from the remote with 3GB, there's no way not to use it. So I need to manually set chunk_size to ≤3GB, right? By doing that, both my files will be split, despite one of them being capable to be uploaded as a single object, because it has 9GB and there's a remote with 10GB available.

Let's advance to other scenario: dozens of remotes, each of them with ~500MB free, and I need to upload a 9GB file. In order to upload this file, I must set chunk_size to a very small size, which would needlessly cause many files to be split in multiple parts.

Is there something like dynamic smart chunker for union so that it only chunk when it's needed depending on available space in remotes?

I may be wrong, but I'm asking this because I guess that having split files is not as efficient as non-split files, so it would be better to avoid chunks when they aren't really needed.

Even if I use a small chunk_size like 500MB, it can happen that at some point rclone distributed files among remotes in a way that all my remotes have less than 500MB free, so I can't upload a 600MB file even if I have a few GB free summing all the remotes. That's something only a dynamic chunker would solve.

There is no "smart chunker" today. Either you chunk always or never.

You might be right here but what is your workload that small efficiency cost makes it problematic to run? If you run very demanding cloud tasks than maybe it is better to look at some enterprise cloud solutions line AWS or Azure?

If you provide some real life example with some test measurements showing that chunker added overhead impact makes your task impossible to run we can think about some solution.

This is also correct. Solution today is to use smaller chunk size.

Thanks for your reply.

I haven't tested union and chunker yet, I'm still thinking about it.

Is there a problem if I change chunk_size when there's already uploaded data? Like in the example, I can't upload a 600MB file because chunk_size is 500MB and no single remote has 500MB free. Can I reduce chunk size without affecting existing data? Will rclone still be able to work with files that were split in different sizes? Or will I need to migrate everything to the new size?

I am not sure TBH. I doubt that such scenario was part of chunker original design. I would think it should work in general but there might be some edge cases causing issues. The best answer is to test and/or study source code.

FYI - I use 100MB chunk size on union spanning TB or storage and at least for my usage I have not noticed any noticeable performance impact.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.