Poor transfer performance and unable to change parameters like chunk size

What is the problem you are having with rclone?

Unable to enforce change in parameters such as chunk size or number of transfers when transferring 100Gi file over S3. Trying to increase copy performance as transfer maxes out at ~110MiB/s, whereas other client such as miniocli transfers same file at 300MiB/s or more. Rclone also shows no activity for the first 3m30s but logs give no information on what is occuring. Haven't tried modifying config file yet as command line options work for everything else, and assumed there should be no difference between config vs command line. Any advice greatly appreciated.

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.2-DEV

  • os/version: sles 15-SP3 (64 bit)
  • os/kernel: 5.3.18-150300.59.87_11.0.78-cray_shasta_c (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.16.12
  • go/linking: dynamic
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

In-house Ceph

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy --no-check-dest --retries 1 --progress --progress-terminal-title --drive-chunk-size 32Mi --transfers=16 -vv --log-file=./rclone-test.log /scratch/pawsey0002/lcampbell/zeros.img courses01:upload-test

The rclone config contents with secrets removed.

[lcampbell]
type = s3
provider = Ceph
endpoint = https://projects.pawsey.org.au
access_key_id = 
secret_access_key = 
[courses01]
type = s3
provider = Ceph
endpoint = https://projects.pawsey.org.au
access_key_id = 
secret_access_key = 

A log from the command with the -vv flag

head rclone-test.log
2023/05/30 23:36:05 DEBUG : rclone: Version "v1.59.2-DEV" starting with parameters ["/software/setonix/2022.11/software/cray-sles15-zen2/gcc-12.1.0/rclone-1.59.2-2wagnjr7mxb4ipuyuvqxsqiid4zc7r4d/bin/rclone" "copy" "--no-check-dest" "--retries" "1" "--progress" "--progress-terminal-title" "--drive-chunk-size" "32Mi" "--transfers=16" "-vv" "--log-file=./rclone-test.log" "/scratch/pawsey0002/lcampbell/zeros.img" "courses01:upload-test"]
2023/05/30 23:36:05 DEBUG : Creating backend with remote "/scratch/pawsey0002/lcampbell/zeros.img"
2023/05/30 23:36:05 DEBUG : Using config file from "/home/lcampbell/.config/rclone/rclone.conf"
2023/05/30 23:36:05 DEBUG : fs cache: adding new entry for parent of "/scratch/pawsey0002/lcampbell/zeros.img", "/scratch/pawsey0002/lcampbell"
2023/05/30 23:36:05 DEBUG : Creating backend with remote "courses01:upload-test"
2023/05/30 23:36:05 DEBUG : zeros.img: Need to transfer - File not found at Destination
2023/05/30 23:36:06 INFO  : S3 bucket upload-test: Bucket "upload-test" created with ACL "private"
2023/05/30 23:39:46 DEBUG : zeros.img: size: 100Gi, parts: 10000, default: 5Mi, new: 11Mi; default chunk size insufficient, returned new chunk size
2023/05/30 23:39:46 DEBUG : zeros.img: multipart upload starting chunk 1 size 11Mi offset 0/100Gi
2023/05/30 23:39:46 DEBUG : zeros.img: multipart upload starting chunk 2 size 11Mi offset 11Mi/100Gi

tail rclone-test.log
2023/05/30 23:54:31 DEBUG : zeros.img: multipart upload starting chunk 9310 size 1Mi offset 99.999Gi/100Gi
2023/05/30 23:54:32 DEBUG : zeros.img: Multipart upload Etag: a10af647106db28a5b88da89482dbec0-9310 OK
2023/05/30 23:54:32 DEBUG : zeros.img: md5 = 09cd755eb35bc534487a5796d781a856 OK
2023/05/30 23:54:32 INFO  : zeros.img: Copied (new)
2023/05/30 23:54:32 INFO  :
Transferred:          100 GiB / 100 GiB, 100%, 113.567 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     18m26.7s

2023/05/30 23:54:32 DEBUG : 13 go routines active

That's a flag for Google Drive and you are not using Google Drive.

Check out:

You are also using old beta rclone version. Update to the latest one.

Thanks, so --s3-chunk-size is the correct option. That makes more sense :slight_smile:

I assume that was the most up-to-date version we could install that would work with our SLES version, but I'll check with my colleague who installed the module. Thanks.

Any ideas about what is happening in the first 3m30s and how to reduce/minimize it?

I'd imagine it's running a checksum against the file and if you have a large file / slow IO, it'll take time.

For large objects, calculating this hash can take some time so the addition of this hash can be disabled with --s3-disable-checksum. This will mean that these objects do not have an MD5 checksum.

Perfect, thank you both. I will try disabling the checksum. Initial trial with the correct chunk size option increased transfer speed to 120MiB/s average with max ~150MiB/s, so promising start :slight_smile: :smiley:

In theory you should see increased throughput by increasing values of --s3-upload-concurrency and --s3-chunk-size. But it comes at the cost of memory used. And of course depends on your S3 provider characteristic (it might prefer specific chunk size and have max concurrency), network speed and latency etc. You have to find what the sweet spot is.

To give you some rough idea I have seen people claiming that e.g.:

--s3-upload-concurrency 160 --s3-chunk-size 64M

works perfectly but is using 10GB of RAM.

Thank you, just the type of advice I was after. I'd already hit an out of memory error with chunks of 1024Mi, but the speed was up ~200MiB/s. So need to find out memory size on the data mover nodes and then more experimenting :slight_smile:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.