Recommendations for using rclone with a minio 10M+ files

:smiley:

Are the 10 million files in one directory or are they in a folder structure? If they are in a folder structure about how many files per folder are there?

Rclone works pretty well for big syncs, but its weak point is millions of files in a single directory - the sync works at the moment by loading the whole directory into memory which for 10 million objects will take quite a while and use lots of memory!

Here are some general tips for s3/minio

  • use --size-only or --checksum instead of the default mod time. The default reads metadata from each object which doesn't scale well. Since you are copying minio -> minio both sides will have checksums so I'd recommend --checksum
  • using --fast-list can make a big speed up, however it requires that you have enough memory to fit all the metadata for the files in memory.

How big are the files you are copying? You might want to raise --s3-upload-cutoff so that they are all copied in a single transaction. This will ensure they have an md5sum and is likely to be more efficient for medium size files (say < 1 GB)

If you are uploading large files then tweaking --s3-concurrency and --s3-chunk-size can make a difference at the cost of using more memory.

Do existing files get updated? Or is it write once, then delete? There is a workflow which will work for that...