(rclone sync) or (rclone mount + sync)?

I have a folder with hundreds of thousands of small files, which I back up to Wasabi with rclone sync. The problem is that only a few files are changed daily (few Mb), and each time the sync command runs it scans all files (which is time-consuming) to copy only the few new ones that have been modified.

I’ve thought about creating a kind of “cold folder”, but the changed files are in several sub folders, that is, several folders have small changes every day.

Before adopting the radical solution of a complete restructuring of the folder, I thought of other approach, but I don’t know if it will make difference:

If, instead of doing the direct sync of the local folder to Wasabi, I execute a mount of the Wasabi volume, and then execute the sync between the local folder and this mounted volume, will I have a better performance in this scan?

I’ve already tested the --fast-list option, with no differences in performance

I suppose rsync on an rclone mount might be something worth trying. It might give you better performance.

like rsync --progress /localfolder /wasabirclonemount. Although they may behave the same. Is this thing that has thousands of small files borgbackup possibly? Or some type of backup software? It may be worth increasing a chunk size for the files it creates, to end up with less files for faster scanning/checking.

Hi @camjac251 , I forgot to mention that I’m running on Windows 10, so rsync is not an option, but anyway Rclone is there for that, right?

They are not chunks of a backup software, but small text files generated by an application.

(For this type of backup - large files with minor modifications - I use Duplicacy)

A recent log:

Transferred:   561.157 kBytes
Errors:                 0
Checks:            211376
Transferred:          194
Elapsed time:    57m14.5s

1 hour to backup ~500 kBytes…

Not sure if this is related but have you tried adjusting the sync option parameters? Like adding more checkers or using the checksum check only, or size only?

I can think of a number of ways of making this better…

Assuming the new files have a recent modification time, you can find them easily with rclone

rclone lsf --files-only --max-age 1d /path/to/local > new_files

you can then use this as the input to a new rclone command. Unfortunately the new_files need a / on the start otherwise the copy will need to recurse…

sed -i 's/^/\//' new_files

Then you can give this to rclone copy to copy just those files with a minimum of directory scanning.

rclone copy --files-from new_files /path/to/local remote:path

I should probably add a flag to lsf to make the paths absolute to make this easier… I realise you are on windows so using sed isn’t ideal.

Another solution would be to set up the cache backend - this stores the directory listing of the remote in its database.

Provided you put --dir-cache-time up high enough this would help.

I think either of the solutions above is probably better though.

Interesting! I’d expect that to make some difference with Wasabi (which are S3 based) - whether it will be faster or slower is difficult to say though!

I made the --absolute flag for rclone lsf

https://beta.rclone.org/v1.41-080-g3ef938eb/ (uploaded in 15-30 mins)

So the workflow above would just be

rclone lsf --absolute --files-only --max-age 1d /path/to/local > new_files
rclone copy --files-from new_files /path/to/local remote:path
2 Likes

Thank you Nick!

I’ll setup this solution in a script and come back here to report the results.

1 Like

I should really read my own documentation… from --files-from docs

Paths within the --files-from file will be interpreted as starting with the root specified in the command. Leading / characters are ignored.

So you don’t need the beta or the --absolute flag!

Just one word: F.A.N.T.A.S.T.I.C!

The log speaks for itself:

Transferred:   2.617 MBytes
Errors:                 0
Checks:                 5
Transferred:            5
Elapsed time:       26.4s

(The same “big folder” from the above post)

Thank you very much Nick!

Your amazing support enhances our use of Rclone. :+1::+1::+1: :grinning:

And as a “plus”, now I have in each backup, in addition to the complete log, a small file with the list of new/modified files.

Just one detail: the above lsf command needs the -R option.

-R, --recursive Recurse into the listing.

2 Likes

Excellent :smiley:

It is quite a good trick that…

I wonder whether I should make a new rclone command to combine the two, say rclone copyfiltered or something like that (can’t think of a better name). Which would run the filters on the source and copy the results to the destination without traversing it.

Or maybe bring back the --no-traverse flag to mean don’t traverse the destination, so run the filters (if any) on the source only.

I think keeping the commands separate may be more flexible, as you can use lsf + sync, lsf + copy, lsf + move, etc.

If I use the --max-age option directly in the sync command (without running lsf before) would I have the same performance as using the above configuration (lsf + sync)? I did some simple tests and it seemed more time consuming, although the log shows the same messages for the skipped files: “Excluded from sync (and deletion)”

But since the files are small and the folder is too large (in number of files), I’m not sure about the performance. But it seems to me that using the --max-age option directly in sync is slower than the lsf + sync combination.

Yes, I’d forgotten about copy and move…

A flag which works on all of those might be an option.

A --max-age flag will still cause the destination to be traversed as it is traversed in lockstep with the source. That is a bit unfortunate, but the way the code works. So I wouldn’t expect a speedup using --max-age on the copy.

I’d also use copy with those experiments instead of sync just in case!

The scripts were already working fine with sync, I just kept the previous configuration and added the lsf before the sync command.

So I’ll stay with your lsf + sync solution, it is simple and straightforward. It is already working very well in all my backup jobs.

I thought one thing: and the deleted files? Will they be moved to the folder indicated by the --backup-dir option, even using this lsf + sync solution with the --files-from option in sync?

The reason I say that is just paranoia because you aren’t starting with a complete source tree… Just don’t add the --delete-excluded flag!

No they won’t. Files you delete will not be found by the lsf.

I suggest that every now and again you run a complete sync to remove deleted files and get any files that got missed.

I thought the same thing. I will set up a job to run weekly.

Hello @ncw,

I found a small problem with this approach: the moved files. As the dates are not changed, they are not detected by lsf --max-age.

Would the solution be only the “full” periodical sync?

Any option like --track-renames would help?

That’s what I’d recommend.

You can use --track-renames with your full sync.

Does Wasabi supports a server side move?

And another point: as I’m using encrypted (.bin) storages, --track-renames will not apply, right?

The S3 protocol supports server side copy so rclone uses that as copy + delete to make a move. So Yes.

That is correct, alas, no checksums on encrypted files.