Poor sync performance with large directores on minio

What is the problem you are having with rclone?

I'm syncing a relatively large and flat directory (444909 files) to a remote Minio instance, and sync check is extremely slow. With no changes to sync, rclone sync still takes about 4-5 hours to complete.

Run the command 'rclone version' and share the full output of the command.

# rclone version
rclone v1.63.1
- os/version: debian 12.0 (64 bit)
- os/kernel: 6.1.0-10-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.6
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Minio

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --progress [local_source] [remote_target]

The rclone config contents with secrets removed.

[target]
type = s3
provider = Minio
access_key_id = ******
secret_access_key = ******
endpoint = https://example.com

A log from the command with the -vv flag

I'm truncating the full log output because it would be ridiculously long. If you insist that a full log is necessary, I'll capture one and provide a gzipped copy.

  1. add --checksum flag to avoid reading modtime
  2. try to increase --checkers value
  3. add --fast-list to buffer all the objects in memory first (it will use about 0.5GB of RAM for 500k objects)
  4. try --s3-list-version 2 flag
rclone sync src dst --checksum --fast-list -checkers 16 --s3-list-version 2

I gave this a try before the addition of --s3-list-version, and it cut the time needed for a sync with no changes down to about 2 hours, which is an improvement but still nowhere close to a B2 sync of the same repo, which finishes in less than a minute. The suggested flags also caused a very high read rate, presumably because --checksum needs to read every single file.

I'll try again with --s3-list-version 2 and report back when that finishes.

With such difference when compared to B2 it would suggest either minio server performance/configuration or network speed/latency. Though I find it hard to believe that it can finish so fast when doing sync local b2: - reading 500k files stats from local storage in less than 1 min, 7-8k files per second?

You could go even further and use --size-only instead of --checksum but it can be a bit risky. Definitely would be much faster:)

Runtime with --s3-list-version 2 went down a little further:

# rclone --progress sync [...] --fast-list --checksum --checkers 16 --s3-list-version 2
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            444909 / 444909, 100%
Elapsed time:   1h46m56.6s

Still pretty bad though, and the whole "reading 450 GB from disk to make sure it's in sync with a remote" thing makes me a little sad.

Though I find it hard to believe that it can finish so fast when doing sync local b2: - reading 500k files stats from local storage in less than 1 min, 7-8k files per second?

I don't know what to tell you:

# rclone --progress sync [local_source] b2:[...]
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            444909 / 444909, 100%
Elapsed time:        47.0s

There's clearly some level of optimization for bulk checks going on here that's missing or malfunctioning on the S3/Minio side. I'll be looking into this further.

It is impressive speed. You are right.

What about to try one more thing? remove --checksum and increase --checkers 64 ?

But I doubt it can get close to B2 results. There is some bottleneck and all your results point towards minio.