Poor sync performance with large directores on minio

What is the problem you are having with rclone?

I'm syncing a relatively large and flat directory (444909 files) to a remote Minio instance, and sync check is extremely slow. With no changes to sync, rclone sync still takes about 4-5 hours to complete.

Run the command 'rclone version' and share the full output of the command.

# rclone version
rclone v1.63.1
- os/version: debian 12.0 (64 bit)
- os/kernel: 6.1.0-10-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.6
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Minio

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --progress [local_source] [remote_target]

The rclone config contents with secrets removed.

[target]
type = s3
provider = Minio
access_key_id = ******
secret_access_key = ******
endpoint = https://example.com

A log from the command with the -vv flag

I'm truncating the full log output because it would be ridiculously long. If you insist that a full log is necessary, I'll capture one and provide a gzipped copy.

  1. add --checksum flag to avoid reading modtime
  2. try to increase --checkers value
  3. add --fast-list to buffer all the objects in memory first (it will use about 0.5GB of RAM for 500k objects)
  4. try --s3-list-version 2 flag
rclone sync src dst --checksum --fast-list -checkers 16 --s3-list-version 2

I gave this a try before the addition of --s3-list-version, and it cut the time needed for a sync with no changes down to about 2 hours, which is an improvement but still nowhere close to a B2 sync of the same repo, which finishes in less than a minute. The suggested flags also caused a very high read rate, presumably because --checksum needs to read every single file.

I'll try again with --s3-list-version 2 and report back when that finishes.

With such difference when compared to B2 it would suggest either minio server performance/configuration or network speed/latency. Though I find it hard to believe that it can finish so fast when doing sync local b2: - reading 500k files stats from local storage in less than 1 min, 7-8k files per second?

You could go even further and use --size-only instead of --checksum but it can be a bit risky. Definitely would be much faster:)

Runtime with --s3-list-version 2 went down a little further:

# rclone --progress sync [...] --fast-list --checksum --checkers 16 --s3-list-version 2
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            444909 / 444909, 100%
Elapsed time:   1h46m56.6s

Still pretty bad though, and the whole "reading 450 GB from disk to make sure it's in sync with a remote" thing makes me a little sad.

Though I find it hard to believe that it can finish so fast when doing sync local b2: - reading 500k files stats from local storage in less than 1 min, 7-8k files per second?

I don't know what to tell you:

# rclone --progress sync [local_source] b2:[...]
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:            444909 / 444909, 100%
Elapsed time:        47.0s

There's clearly some level of optimization for bulk checks going on here that's missing or malfunctioning on the S3/Minio side. I'll be looking into this further.

It is impressive speed. You are right.

What about to try one more thing? remove --checksum and increase --checkers 64 ?

But I doubt it can get close to B2 results. There is some bottleneck and all your results point towards minio.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.