Rclone hangup on large file and maxing IOPS

What is the problem you are having with rclone?

When trying to copy large files (in my case, 512MB prometheus chunks), rclone hangs up while maxing disk IOPS on the server (had to limit because it would bring the whole server down)
No data transfers to Internet made during this hangup.

What is your rclone version (output from rclone version)

v1.50.2

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 20.04.LTS (in LXD)

Which cloud storage system are you using? (eg Google Drive)

Encrypted storage on S3 API (Scaleway)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

CONTAINER=frnte1-monitoring1
TODAY=$(date +"%Y-%m-%d")

echo "Doing snapshot"
SNAPSHOT_ID=$(lxc exec $CONTAINER -- bash -c "curl -XPOST http://[::1]:9090/api/v2/admin/tsdb/snapshot" | jq -r .name | cat)

echo "Snapshot ID: $SNAPSHOT_ID"

echo "Send data"
lxc exec $CONTAINER -- rclone --s3-chunk-size=20M --buffer-size=128M --size-only -v sync /var/lib/prometheus/metrics2/snapshots/$SNAPSHOT_ID/ scw-fr-par-crypto:frnte1-monitoring1/prometheus/tsdb/$SNAPSHOT_ID

echo "Cleaning snapshot and archive"
lxc exec $CONTAINER -- rm -rf /var/lib/prometheus/metrics2/snapshots/$SNAPSHOT_ID/

The rclone config contents with secrets removed.

[scw-fr-par]
type = s3
provider = other
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = fr-par
endpoint = https://s3.fr-par.scw.cloud
location_constraint = fr-par
acl = private

[scw-fr-par-crypto]
type = crypt
remote = scw-fr-par:xxxx
filename_encryption = standard
directory_name_encryption = true
password = xxx
password2 = xxx

A log from the command with the -vv flag

$ bash backup-prometheus.sh
Doing snapshot
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    43  100    43    0     0     43      0  0:00:01 --:--:--  0:00:01    43
Snapshot ID: 20201017T195956Z-3e7bb9b35a95b89
Send data
2020/10/17 19:59:57 DEBUG : rclone: Version "v1.50.2" starting with parameters ["rclone" "--s3-chunk-size=20M" "--buffer-size=128M" "--size-only" "-vv" "sync" "/var/lib/prometheus/metrics2/snapshots/20201017T195956Z-3e7bb9b35a95b89/" "scw-fr-par-crypto:frnte1-monitoring1/prometheus/tsdb/20201017T195956Z-3e7bb9b35a95b89"]
2020/10/17 19:59:57 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2020/10/17 19:59:58 INFO  : Encrypted drive 'scw-fr-par-crypto:frnte1-monitoring1/prometheus/tsdb/20201017T195956Z-3e7bb9b35a95b89': Waiting for checks to finish
2020/10/17 19:59:58 INFO  : Encrypted drive 'scw-fr-par-crypto:frnte1-monitoring1/prometheus/tsdb/20201017T195956Z-3e7bb9b35a95b89': Waiting for transfers to finish
2020/10/17 19:59:58 INFO  : 01EMJGTET6A5HKTN09BK59B3KF/tombstones: Copied (new)
2020/10/17 19:59:58 INFO  : 01EMJGTET6A5HKTN09BK59B3KF/meta.json: Copied (new)
2020/10/17 19:59:58 INFO  : 01EMVH66ET92HE9J2X50E6MFBH/index: Copied (new)
2020/10/17 19:59:58 INFO  : 01EMVH66VXFZNXS6XX82QFF2AG/meta.json: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVH66VXFZNXS6XX82QFF2AG/tombstones: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVR1Z1PR3E7DDHNTSGGJ5CG/meta.json: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVR1Z1PR3E7DDHNTSGGJ5CG/index: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVR1Z1PR3E7DDHNTSGGJ5CG/tombstones: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVH66VXFZNXS6XX82QFF2AG/index: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVYXNJ3201XD0NTSQ5GEXP3/index: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVYXNJ3201XD0NTSQ5GEXP3/tombstones: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMVYXNJ3201XD0NTSQ5GEXP3/meta.json: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMW2BD00AZCSA9N3GKE0BPXX/meta.json: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMW2BD00AZCSA9N3GKE0BPXX/tombstones: Copied (new)
2020/10/17 19:59:59 INFO  : 01EMW2BD00AZCSA9N3GKE0BPXX/index: Copied (new)
# Hangs there forever, I suppose dealing with the 512MB data chunks from Prometheus, and doing a LOT of IOPS that would freeze the server if the container was not rate limited)

hello and welcome to the forum,

that is an old version of rclone, you might want to update.
https://rclone.org/downloads/#script-download-and-install.

you could try to tweak
https://rclone.org/docs/#transfers-n
https://rclone.org/docs/#checkers-n

Hello.

Thanks, I will try again with a newer version, this was the version from the 20.04 repositories.

try removing these flags,
just use as few flags as possible first, see it the rclone command works, then add flags if needed.

I think I found what can be the issue.
Prometheus is using hard links for its snapshots, apparently rclone doesn't like those.

That should not be the case as hard links really show up as normal files and are just copied normally as I use them all the time.

I did some changes on my setup.
Upgrading it to the last rclone version (from the website .deb file) and changing provider to encrypted over b2.
I don't have high IO issues, but there are issues with those files, they to appear in the copy logs but are not copied completely, on the B2 side, they appears with a 0 size and shown as "started large file")

$ bash backup-prometheus.sh
Doing snapshot
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    43  100    43    0     0     50      0 --:--:-- --:--:-- --:--:--    50
Snapshot ID: 20201018T165548Z-27ec09ed4f45fd0
Send data
2020/10/18 16:55:49 DEBUG : rclone: Version "v1.53.1" starting with parameters ["rclone" "--fast-list" "--size-only" "-vv" "sync" "/var/lib/prometheus/metrics2/snapshots/20201018T165548Z-27ec09ed4f45fd0/" "b2-encrypted:frnte1-monitoring1/prometheus/tsdb"]
2020/10/18 16:55:49 DEBUG : Creating backend with remote "/var/lib/prometheus/metrics2/snapshots/20201018T165548Z-27ec09ed4f45fd0/"
2020/10/18 16:55:49 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2020/10/18 16:55:49 DEBUG : Creating backend with remote "b2-encrypted:frnte1-monitoring1/prometheus/tsdb"
2020/10/18 16:55:50 DEBUG : Creating backend with remote "b2:xxx/frnte1-monitoring1/prometheus/tsdb.bin"
2020/10/18 16:55:51 DEBUG : Creating backend with remote "b2:xxx/frnte1-monitoring1/prometheus/tsdb"
2020/10/18 16:55:52 DEBUG : 01EMYA6Z793R6BNYZR4BSH0Y5Y/index: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:52 DEBUG : Encrypted drive 'b2-encrypted:frnte1-monitoring1/prometheus/tsdb': Waiting for checks to finish
2020/10/18 16:55:52 DEBUG : Encrypted drive 'b2-encrypted:frnte1-monitoring1/prometheus/tsdb': Waiting for transfers to finish
2020/10/18 16:55:52 DEBUG : 01EMYA6Z793R6BNYZR4BSH0Y5Y/meta.json: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:52 DEBUG : 01EMYA6Z793R6BNYZR4BSH0Y5Y/tombstones: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:52 DEBUG : 01EMYA6Z793R6BNYZR4BSH0Y5Y/chunks/000001: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:52 INFO  : 01EMYA6Z793R6BNYZR4BSH0Y5Y/tombstones: Copied (new)
2020/10/18 16:55:52 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/index: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:53 INFO  : 01EMYA6Z793R6BNYZR4BSH0Y5Y/index: Copied (new)
2020/10/18 16:55:53 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/meta.json: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:53 INFO  : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/meta.json: Copied (new)
2020/10/18 16:55:53 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/tombstones: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:53 INFO  : 01EMYA6Z793R6BNYZR4BSH0Y5Y/meta.json: Copied (new)
2020/10/18 16:55:53 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/chunks/000001: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:53 INFO  : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/tombstones: Copied (new)
2020/10/18 16:55:53 DEBUG : 01EFPA0SPNJ2GBSY82QVEYSM73/index: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:53 INFO  : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/index: Copied (new)
2020/10/18 16:55:53 DEBUG : 01EFPA0SPNJ2GBSY82QVEYSM73/meta.json: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:54 INFO  : 01EFPA0SPNJ2GBSY82QVEYSM73/meta.json: Copied (new)
2020/10/18 16:55:54 DEBUG : 01EFPA0SPNJ2GBSY82QVEYSM73/tombstones: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:54 INFO  : 01EMYA6Z793R6BNYZR4BSH0Y5Y/chunks/000001: Copied (new)
2020/10/18 16:55:54 DEBUG : 01EFPA0SPNJ2GBSY82QVEYSM73/chunks/000001: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:54 INFO  : 01EFPA0SPNJ2GBSY82QVEYSM73/tombstones: Copied (new)
2020/10/18 16:55:54 DEBUG : 01EFPA0SPNJ2GBSY82QVEYSM73/chunks/000002: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:58 INFO  : 01EFPA0SPNJ2GBSY82QVEYSM73/index: Copied (new)
2020/10/18 16:55:58 DEBUG : 01EFPA0SPNJ2GBSY82QVEYSM73/chunks/000003: Computing SHA-1 hash of encrypted source
2020/10/18 16:55:39 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/chunks/000001.bin: Starting upload of large file in 4 chunks (id "4_z976d8951572d3f877053071e_f2015fc0b3155ab98_d20201018_m165554_c003_v0312002_t0006")
2020/10/18 16:57:32 INFO  :
Transferred:       78.509M / 11.438 GBytes, 1%, 1.011 MBytes/s, ETA 3h11m48s
Transferred:           10 / 51, 20%
Elapsed time:      1m20.1s
Transferring:
 *      01EMY3K4C9S4BNG2CV5F9Y4JZ1/chunks/000001:  0% /350.537M, 0/s, -
 *      01EFPA0SPNJ2GBSY82QVEYSM73/chunks/000001:  0% /511.999M, 0/s, -
 *      01EFPA0SPNJ2GBSY82QVEYSM73/chunks/000002:  0% /511.999M, 0/s, -
 *      01EFPA0SPNJ2GBSY82QVEYSM73/chunks/000003:  0% /511.998M, 0/s, -

2020/10/18 16:57:32 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/chunks/000001.bin: Sending chunk 1 length 100663296
2020/10/18 16:57:36 DEBUG : 01EMY3K4C9S4BNG2CV5F9Y4JZ1/chunks/000001.bin: Sending chunk 2 length 100663296
kedare@frnte1-ltc1:~/lxd-scripts$ lxc exec frnte1-monitoring1 bash
root@frnte1-monitoring1:~# su -dh /va^C
root@frnte1-monitoring1:~# du -sh /var/lib/prometheus/metrics2/snapshots/20201018T165548Z-27ec09ed4f45fd0/
12G     /var/lib/prometheus/metrics2/snapshots/20201018T165548Z-27ec09ed4f45fd0/

Ok I just found the issue, it got reaped by OOM killer, any way to reduce the memory usage during the copy ?

there a many flags.
you can search for memory and read on https://rclone.org/docs/.

do not use --fast-list

you could try to tweak
https://rclone.org/docs/#transfers-n
https://rclone.org/docs/#checkers-n
https://rclone.org/docs/#buffer-size-size
https://rclone.org/docs/#use-mmap

Thank you, I will take a look at those