with many cloud providers being stingy with IO/s, backing up a server with several hundred thousand 4kb files can take days. if they could be compiled into 100mb chunks transparently by rclone, it would cut the transfer time down by a few days. i'm not sure how this would work technically, but i guess that's why i'm not on the rclone team and you awesome people are cheers on the wonderful product
this is a daily occurrence, as we use rclone as the backup for our production servers. and the biggest reason is as you mentioned: the 1:1 object mapping that rclone does. i'm hesitant to use a backup tool that holds exclusive read access to all our data
i s'pose this would require some kind of differential sync; only updating certain blocks of the file. this would also be a nice feature but i can see now how this would be difficult. as it stands, it would require re-uploading the entire chunk just because a 4kb file changed. i guess my plan isn't as cool as i thought
i'll def look into the rcat for archiving though. thanks!
the only thing i've every looked into is a --max-age flag. but as it only works with copy and doesn't delete deleted files, and also doesn't upload renamed files, (windows doesn't count renaming as modifying) i've abandoned it.
so that's where we are. start the syncs at 9-10 P.M. and they're usually done by morning
Using the --max-age flag and copy is a technique I call doing a top-up sync. It doesn't catch deletes or renames, but it does mean that there is a copy of your data safe. So you could do a top-up sync once an hour and a full sync once a day or something like that!
You should find a top-up sync with rclone copy --max-age runs much quicker. It may run quicker still if you add --no-traverse (not sure about onedrive/sharepoint - you'll have to try it).
What is taking the time in your syncs? Is it the initial checking phase or is it the transferring phase? Rclone runs them both concurrently normally so it can be difficult to tell. You can use the --check-first flag to make rclone run them sequentially and this is a good idea if your servers have HDD rather than SSD.
at first run this does not seem to make much of a difference, regretfully. (i'll test it some more) does microsoft have a form we can fill out to allow infinite requests as well as a free 10% share in their cloud business? i'd be ok with the classic limit of "only 1 applicant per household" on it.
it basically goes by requests. you can upload/transfer a terabyte or two and it won't make a fuss if the files are few and large. but if the files are many and small, or just scanning the remote for changes, they are quick to swat you. they'll allow you between 20,000 - 30,000 checks before they throttle you back to increments of ~2,000 checks every bunch of minutes. we have approximately 2 million files that we sync every night, so it's the multitude of requests that takes time. (the onedrive personal accounts are not nearly as stingy. they'll allow us probably half a million requests before they even think about throttling. that's why we switched to them for our daily syncs, working around the 1tb limit with rclone's union feature. (it's awesome) and using sharepoint more for archiving.) but many of these files are users' personal photo archive with hundreds of thousands of small phone pictures that hardly get touched but once a year. that's why i was thinking the reverse chunker would be almost beneficial for many of them, even now that i realize (using a 250mb chunk) one 3mb photo changing would result in needing to upload 250mb. it's also why i've been such a gigantic nag about periodically checking with the other users on the forum if nobody's accumulated interest in running a local database for tracking changes. that would reduce the sync time to seconds. just some thoughts.
If the remotes don't get changed outside your backup routine then something like the cache backend is what you want. This keeps a record of what is in the backend so you don't have to keep re-reading it.
You'll note, however, the cache backend is deprecated. I'm currently making a plan for turning the VFS cache into a new cache backend which will solve this problem in a maintainable way.
they don't. that's the only thing those remotes get used for.
this is a feature i've asked numerous guys on the forum about; nobody told me it's already in existence. what the fiddle. after running a few tests it seems to do exactly what i'm after. i just scanned a 400,000 file directory 7 times in one minute, where it took hours and hours before. in fact, this is so good, i'm getting suspicious now......
p.s. one question. what is a "chunk" in this case? i'm not very familiar with how databases work..
However it was an evolutionary dead end as far as rclone development goes as what people wanted was it more tightly integrated with rclone mount.
The cache backend can store data from files which it does in chunks. I don't think you need this feature though, you should just store the metadata and none of the data. I can't remember how you configure this though!