Automatically splitting files that are larger than a given size

dima · November 7, 2018, 11:59pm

I’m trying to store backups on s3 using rclone. Conceptually, it works great. However, I have to deal with a fairly large data set that’s over 10Tb that has millions upon millions of small files. Average file size is around 40k. If I use rclone sync as is, I’m getting a fairly big performance hit. Over 10 times slower than if I’m backing up 10Tb of files that are larger than, say, 100Mb. Solution I’m thinking about is to have a feature when rclone rcat automatically splits remote file once it reaches certain size. Kind of similar to how Unix split works - once certain size is reached, start a new file w/ a different suffix. For example, if I want to store 550Gb file in 100Gb chunks, result would look like:
filename.00 (100Gb)
filename.01 (100Gb)
filename.03 (100Gb)
filename.04 (100Gb)
filename.05 (100Gb)
filename.06 (50Gb)
And rclone cat would reassemble it on a fly. Same can probably be applied to copy and sync commands.

Thoughts on this?

Cheers!

–dima

ncw · November 8, 2018, 11:07am

Rclone doesn’t have a way of doing this automatically yet. There is an issue for it though: https://github.com/ncw/rclone/issues/497 which is also moving up the to do list!

The plan would be to have a new backend you would layer over another backend which would split the files when they got too big.

rclone cat remote:path/to/dir --include "filename.*" would do that already BTW