Remote metadata cache to improve performance

I know there have been previous discussions on the topic of metadata cache.

I'm trying to sync a moderate number of files to S3 compatible cloud storage. 12GB comprised of 17k files takes almost 5 minutes to check every time the sync is run (I'm using 16 checkers and transfers).

If we know this is the only process uploading to that destination, we should be able to keep a local cache of the metadata and check against that instead, perhaps doing the full (and resource intensive) remote check periodically (e.g., once a week).

Based on the suggestion here, I wrote a quick console app that caches the metadata before calling rclone, and produces a list of the differences. Not surprisingly, this multithreaded process took only 12 seconds to run, compared to over 4 minutes for remote checks.

As suggested in the link above, I am trying to feed that list into --files-from, which works for new and updates files. Is there a way to also do deletes in the same command, or do I need to call rclone delete with a separate deleted files list?

I hope the local cache can be considered, as it reduces resource usage significantly for large syncs.

There’s already a feature request out there for that.

I agree a metadata cache would be a nice feature.

However your sync can be made to run faster. Its worth reading the s3 optimization section in the docs also. The HEAD requests to read the modified time take a long time, so using --checksum or --size-only or --use-server-modtime --update will avoid these with various tradeoffs.

Using --fast-list will trade memory for speed also and works very well on S3.

There is an alternate strategy using what I call a top-up sync frequently and with a full sync less frequently. (This should be combined with whatever flags work best for you above).

This is doing an rclone copy with --max-age 1h (say) so rclone only considers files modified within the last hour. With S3 adding --no-traverse is probably a good idea here too.

This will be very quick. Say you do the top up sync every 1h you can do a full sync every 24h which will delete files and catch any stragglers.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.