Idea for improving performance of file checks

When debugging my backup process, I spent 90% of the time waiting for rclone to skip over (check) existing files. I am syncing millions of small files onto some remote destination. Sometimes online (B2). Other times on a network share.

In order to speed this up, we can create a cache file containing key-value pairs where:

key = hash of the command-line that triggered checking of the directory
value = timestamp when the check completed

Then, when the user runs a command we’d skip over any directories that we already processed in the past X milliseconds (with some reasonable default value). When running incremental backups, I would happily skip over any directories that have been processed in the past hour as I only run backups once a day in production (shorter time periods indicate I am debugging).

Thoughts?

Gili

2 Likes

The file checks are pretty quick depending on what you’re syncing. Provide your command and which remote type you are having issues with. You can also use --fast-list if you have plenty of RAM as that will make the directory listings more efficient.

Alternatively, you can ‘work around’ your desire to sync files changed in the past hour by generating a list of files to sync using a command line tool that looks at modification times and then use the ‘–files-from’ parameter with that generated list to sync only those files.

Also, you could implement a local cache if that fits your needs to locally store metadata to help.

Hi Calisro,

I am backing up 53.4GB made up of 20,425 files. Running a check (even if there are no modifications) takes over an hour.

My command line is:

rclone sync --stats-log-level NOTICE --fast-list --delete-excluded --transfers=10 -v --exclude “/AppData/Local/" --exclude "/AppData/LocalLow/” --exclude “/My Documents/" --exclude "/NetHood/” --exclude “/Start Menu/" --exclude "/SendTo/” --exclude “/Templates/" --exclude "/Application Data/” --exclude “/PrintHood/" --exclude "/Cookies/” --exclude “/Recent/" --exclude "/Local Settings/” --exclude “/Documents/My Music/" --exclude "/Documents/My Videos/” --exclude “/Favorites/" --exclude "/Documents/My Pictures/” --exclude “/NTUSER.DAT*” --exclude “/ntuser.dat*” --exclude “/AppData/Roaming/Microsoft/Windows/Recent/" --exclude "/node_modules” [source] [target]

where [source] is a entry point I wish to back up recursively and [target] is configures as follows:

type = local
nounc = false

The machines are linked over a 802.11ac wifi connection, Windows share, connected to an external drive over a USB3 connection. When backing up large files I get high speeds (upwards of 60Mbps). When running file checks I get only about 3Mbps.

Any ideas?

Gili

Do you have your own API key? That would be the first thing to do to improve performance.

Do you have your own API key? That would be the first thing to do to improve performance.

I don’t understand. What API key are you referring to?

He’s asking if you made your own client ID.

https://rclone.org/drive/#making-your-own-client-id

Hmm, seeing as I’m not using Google Drive (I am backing up to a network drive) I fail to see how this is relevant.

Have you tried the cache backend like suggested?

I misunderstood. You still should try cache. It’ll cache the metadata of what is on the remote you are syncing to.

I didn’t know about this feature before now. Yes, it does what I’m looking for with one big caveat: you cannot hit CTRL+C or break out of a recursive command. The entire reason I am trying to cache is because I am debugging my backup script. I need to abort it as soon as I see something wrong.

Is this a bug or a “feature”? :slight_smile:

Not sure what you mean. You can abort it with ctrl-c or whatever.

Doesn’t work for me under Windows 10. Running “rclone lsd -R” against a “local” target is abortable using CTRL+C, but doing the same against a “cache” target ignores CTRL+C. Not sure why.

Which platform are you testing against?

Found this

Wonder is if wasn’t fully fixed or if it crept back in. Might want to open a issue.

Good catch. I opened https://github.com/ncw/rclone/issues/2997