When debugging my backup process, I spent 90% of the time waiting for rclone to skip over (check) existing files. I am syncing millions of small files onto some remote destination. Sometimes online (B2). Other times on a network share.
In order to speed this up, we can create a cache file containing key-value pairs where:
key = hash of the command-line that triggered checking of the directory
value = timestamp when the check completed
Then, when the user runs a command we’d skip over any directories that we already processed in the past X milliseconds (with some reasonable default value). When running incremental backups, I would happily skip over any directories that have been processed in the past hour as I only run backups once a day in production (shorter time periods indicate I am debugging).