I have a folder with hundreds of thousands of small files, which I back up to Wasabi with rclone sync. The problem is that only a few files are changed daily (few Mb), and each time the sync command runs it scans all files (which is time-consuming) to copy only the few new ones that have been modified.
I’ve thought about creating a kind of “cold folder”, but the changed files are in several sub folders, that is, several folders have small changes every day.
Before adopting the radical solution of a complete restructuring of the folder, I thought of other approach, but I don’t know if it will make difference:
If, instead of doing the direct sync of the local folder to Wasabi, I execute a mount of the Wasabi volume, and then execute the sync between the local folder and this mounted volume, will I have a better performance in this scan?
I’ve already tested the --fast-list option, with no differences in performance
I suppose rsync on an rclone mount might be something worth trying. It might give you better performance.
like rsync --progress /localfolder /wasabirclonemount. Although they may behave the same. Is this thing that has thousands of small files borgbackup possibly? Or some type of backup software? It may be worth increasing a chunk size for the files it creates, to end up with less files for faster scanning/checking.
Not sure if this is related but have you tried adjusting the sync option parameters? Like adding more checkers or using the checksum check only, or size only?
you can then use this as the input to a new rclone command. Unfortunately the new_files need a / on the start otherwise the copy will need to recurse...
sed -i 's/^/\//' new_files
Then you can give this to rclone copy to copy just those files with a minimum of directory scanning.
I should probably add a flag to lsf to make the paths absolute to make this easier... I realise you are on windows so using sed isn't ideal.
Another solution would be to set up the cache backend - this stores the directory listing of the remote in its database.
Provided you put --dir-cache-time up high enough this would help.
I think either of the solutions above is probably better though.
Interesting! I'd expect that to make some difference with Wasabi (which are S3 based) - whether it will be faster or slower is difficult to say though!
I wonder whether I should make a new rclone command to combine the two, say rclone copyfiltered or something like that (can't think of a better name). Which would run the filters on the source and copy the results to the destination without traversing it.
Or maybe bring back the --no-traverse flag to mean don't traverse the destination, so run the filters (if any) on the source only.
I think keeping the commands separate may be more flexible, as you can use lsf + sync, lsf + copy, lsf + move, etc.
If I use the --max-age option directly in the sync command (without running lsf before) would I have the same performance as using the above configuration (lsf + sync)? I did some simple tests and it seemed more time consuming, although the log shows the same messages for the skipped files: “Excluded from sync (and deletion)”
But since the files are small and the folder is too large (in number of files), I’m not sure about the performance. But it seems to me that using the --max-age option directly in sync is slower than the lsf + sync combination.
A flag which works on all of those might be an option.
A --max-age flag will still cause the destination to be traversed as it is traversed in lockstep with the source. That is a bit unfortunate, but the way the code works. So I wouldn't expect a speedup using --max-age on the copy.
I'd also use copy with those experiments instead of sync just in case!
The scripts were already working fine with sync, I just kept the previous configuration and added the lsf before the sync command.
So I'll stay with your lsf + sync solution, it is simple and straightforward. It is already working very well in all my backup jobs.
I thought one thing: and the deleted files? Will they be moved to the folder indicated by the --backup-dir option, even using this lsf + sync solution with the --files-from option in sync?