Maximum resources usage

MaxPayne · February 7, 2023, 6:44pm

What is the problem you are having with rclone?

I see too low usage of resources and too low speed when using rclone.

I have a big EC2 instance and I'd like to use all resources that RClone can to provide maximum eff. I have c6in.32xlarge with 200 Gigabit network bandwidth, 256.0 GiB of RAM and 128 vCPUs. So I'd like to utilize almost all of them and receive at less few gigabytes per second transfer speed. Currently I have speed ~400 MiB/s and top utility says that rclone loads about 3-4 full CPUs (300-400 CPUs %). I'd like more!
We're transferring different files starting from 20-50-200 MBs and up to 40 GBs.

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1

os/version: amazon 2 (64 bit)
os/kernel: 4.14.301-224.520.amzn2.x86_64 (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.19.4
go/linking: static
go/tags: none

Which cloud storage system are you using? (eg Google Drive)

S3 => S3

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

NPROC="${_RCLONE_NPROC:-$(nproc --all)}" # 128 in my instance

BWLIMIT=${_RCLONE_BWLIMIT:-0}
TRANSFERS=${_RCLONE_TRANSFERS:-$((NPROC * 4))}
CHECKERS=${_RCLONE_CHECKERS:-$((NPROC * 4))}
UPLOAD_CONCURRENCY=${_RCLONE_UPLOAD_CONCURRENCY:-$((NPROC * 4))}
CHUNK_SIZE=${_RCLONE_CHUNK_SIZE:-256M}
CUTOFF_SIZE=${_RCLONE_CUTOFF_SIZE:-256M}
LOG_LEVEL=${_RCLONE_LOG_LEVEL:-NOTICE}
BUFFER_SIZE=${_RCLONE_BUFFER_SIZE:-256M}
MULTI_THREAD_CUTOFF=${_RCLONE_MULTI_THREAD_CUTOFF:-${CUTOFF_SIZE}}
MULTI_THREAD_STREAMS=${_RCLONE_MULTI_THREAD_STREAMS:-${NPROC}}

rclone sync \
"source:${SOURCE_BUCKET}/${SOURCE_OBJECT}" \
"target:${TARGET_BUCKET}/${TARGET_OBJECT}" \
--auto-confirm \
--create-empty-src-dirs \
--use-json-log \
--s3-disable-checksum \
--use-mmap \
--s3-memory-pool-use-mmap \
--order-by size,mixed,75 \
--max-backlog 10000 \
--buffer-size "${BUFFER_SIZE}" \
--bwlimit "${BWLIMIT}" \
--transfers "${TRANSFERS}" \
--checkers "${CHECKERS}" \
--s3-upload-concurrency "${UPLOAD_CONCURRENCY}" \
--s3-chunk-size "${CHUNK_SIZE}" \
--s3-upload-cutoff "${CUTOFF_SIZE}" \
--multi-thread-cutoff "${MULTI_THREAD_CUTOFF}" \
--multi-thread-streams "${MULTI_THREAD_STREAMS}" \
--config ./rclone.conf \
--log-level "${LOG_LEVEL}" \
--stats-log-level "${LOG_LEVEL}" \
--stats 10s

The rclone config contents with secrets removed.

[source]
type = s3
provider = AWS
env_auth = false
access_key_id = ${SOURCE_ACCESS_KEY_ID}
secret_access_key = ${SOURCE_SECRET_ACCESS_KEY}
session_token = ${SOURCE_SESSION_TOKEN}
region = ${SOURCE_REGION}
location_constraint = ${SOURCE_REGION}
sse_kms_key_id = ${SOURCE_BUCKET_SSE_KMS_KEY_ID}
server_side_encryption = ${SOURCE_BUCKET_SSE}

[target]
type = s3
provider = AWS
env_auth = false
access_key_id = ${TARGET_ACCESS_KEY_ID}
secret_access_key = ${TARGET_SECRET_ACCESS_KEY}
session_token = ${TARGET_SESSION_TOKEN}
region = ${TARGET_REGION}
location_constraint = ${TARGET_REGION}
sse_kms_key_id = ${TARGET_BUCKET_SSE_KMS_KEY_ID}
server_side_encryption = ${TARGET_BUCKET_SSE}

A log from the command with the `-vv` flag

"msg":"\nTransferred:   \t  293.994 GiB / 365.499 GiB, 80%, 416.831 MiB/s, ETA 2m55s\nChecks:                62 / 62, 100%\nTransferred:            1 / 13, 8%

ncw · February 9, 2023, 8:01am

Looks like you have all the right parameters. Increasing S3 chunk size, and S3 concurrency is usually a win as is increasing transfers. What parameters are you using here?

For S3 to S3 transfer --checksum is quickest.

Might want to try --disable-http2 can't remember whether S3 uses http2 but the go implementation squeezes everything down one TCP connection which hurts performance.

MaxPayne · February 10, 2023, 3:55pm

Hi, ncw, thanks for the answer!

Current values for s3-chunk-size is 256M and s3-upload-concurrency is an incredible 512 (128 cores * 4). I know that it is really huge value, but as it does not fully utilize CPU, RAM, and network bandwidth – I think I don't have ideas about decreasing this value.

I'm using s3-disable-checksum to skip large files' checksums recalcs, do I need to disable it before enabling checksum?

I will try disable-http2.

MaxPayne · February 10, 2023, 5:05pm

So I've found something after changing the log format

2023/02/10 16:58:50 NOTICE:
Transferred:       26.093 GiB / 365.499 GiB, 7%, 446.572 MiB/s, ETA 12m58s
Checks:                66 / 66, 100%
Transferred:            0 / 13, 0%
Elapsed time:       1m0.0s
Transferring:
 * /…9.MX.M10Y22.TXT.001.GZ:  7% /22.892Gi, 31.228Mi/s, 11m30s
 * /…2.RX.M10Y22.TXT.001.GZ:  8% /24.509Gi, 34.617Mi/s, 11m6s
 * /…2.RX.M10Y22.TXT.004.GZ:  7% /25.382Gi, 34.344Mi/s, 11m36s
 * /…2.RX.M10Y22.TXT.003.GZ:  7% /25.653Gi, 34.834Mi/s, 11m35s
 * /…2.RX.M10Y22.TXT.002.GZ:  7% /25.888Gi, 34.240Mi/s, 11m54s
 * /…2.RX.M10Y22.TXT.005.GZ:  7% /26.234Gi, 34.960Mi/s, 11m49s
 * /…2.MX.M10Y22.TXT.007.GZ:  7% /28.365Gi, 35.763Mi/s, 12m32s
 * /…2.MX.M10Y22.TXT.002.GZ:  6% /30.583Gi, 35.209Mi/s, 13m50s
 * /…2.MX.M10Y22.TXT.001.GZ:  6% /31.004Gi, 32.856Mi/s, 15m6s
 * /…2.MX.M10Y22.TXT.005.GZ:  6% /31.013Gi, 35.346Mi/s, 13m58s
 * /…2.MX.M10Y22.TXT.004.GZ:  6% /31.197Gi, 35.254Mi/s, 14m6s
 * /…2.MX.M10Y22.TXT.003.GZ:  6% /31.309Gi, 35.077Mi/s, 14m12s
 * /…2.MX.M10Y22.TXT.006.GZ:  6% /31.469Gi, 33.833Mi/s, 14m52s

as we can see, each file transfer speed is about 30-35 MB/s, but at the same time, if I will test the download/upload speed for single file using aws s3 cp it will be about 250 MB/s for each without losing the speed if I will add more s3 cp commands in parallel.

I think it should be increased at least ~10x times for each file, which will do about 4-5 GB/s and 30-40 Gbps (anyway I'd like to receive more – 80-100 Gbps and ~10 Gigabytes per second will be enough, so I'd like to find a way to do a 20x boost, any ideas?).

And I think chunk size should not be less than 256M because as I said before the exemplary speed of aws s3 cp is about 250 MB/s. So ~one second per chunk seems not bustable to me.
When I have now about 576 value for --multi-thread-streams and each file size are 40 GB => it will be split for 40 / 0,256 = ~156 chunks each at maximum and I have good reserve here.

At the same time, I have 576 value for --s3-upload-concurrency, chunk size and cut-off are 256 M, and seems upload has a reserve too.

UPD
I've switched log level to DEBUG and see interesting situation:

It does not upload file-related chunks in parallel, firstly 1 chunk for all files, then 2 chunk for all files, then 3 chunk for all files, etc.
It may be possible when downloading and uploading speeds are about the same: in parallel, it downloaded the first chunk for each file, after this started pushing the first chunk for each file and downloading the second chunk, and so on.
But where is concurrency? Why it is not downloading for example 10 chunks per file in parallel?