Best way to sync bucket with millions of files weekly

I have an Amazon S3 bucket that has about 3 million files (with new files being added to it every day).
I currently have an rclone job that copies this bucket to a Wasabi bucket. I run this sync every week.

So far this works. The copy works fine, but takes hours. I also worry about the number of API calls/data transfer that it uses.
I use both the --fast-list option as well as --max-age.

I've noticed rclone spends a lot of time initially upfront when the job starts. My guess is because it is enumerating the files to copy.

I'd like to know what best command-line options to use to do the following:

  1. Minimize the number of API calls to S3
  2. Minimize the amount of data transfer

With the --max-age options it seems like when you use it, rclone still enumerates every file from S3 but just skips over it locally on the client if it's older than --max-age. Can this be optimized so that this age/date is sent to the S3 API when fetching the list of files?

Is it possible to have --max-age value be sent to the S3 API so that the API only returns items less than certain age (thus making the enumeration faster)?

Thanks for the help!

hello and welcome to the forum,

i also use wasabi and aws s3 but in the opposite direction.
the primary data source is wasabi, api calls are fast and free.
the backup is aws s3 deep glacier

that would be awesome!
do you know if such an api feature currently exists, as i have seen it?

Can you send your complete command line and we can help you optimize it.

--max-age only works really well when the source is the local disk otherwise, as you've noticed, it still traverses everything.

I'd probably use --fast-list and also --checksum which will stop rclone using HEAD requests to read the time from the files. If you aren't using --checksum then it will speed things up enormously.

I don't think the S3 API can do that, alas.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.