Rclone with High Latency Archive Storage

What is the problem you are having with rclone?

I am trying to use RCLONE with an HSM system that returns ENOTREADY when data is on a slower medium (Cloud or tape). RClone treats this like any other error (or NFS/SSH hides the error) and eventually runs out of retries. causing the transfer to fail that with enough retries would eventually complete.

Also we are using a large number of parallel transfers because the queue that recalls data from archive optimizes and can recall data in parallel (60+ at a time). Thus we want to use Rclone to be faster than serial operations like scp that will wait one file at a time not fully utilizing parallel network recalls.

Errors are like and will continue until the archive has pulled the data fully off the archive media onto disk cache:
2020-04-29 15:47:32 ERROR : migrate.3gb.629: Failed to copy: multpart copy: read failed: sftp: "Failure" (SSH_FX_FAILURE)

Desired behavior would be (in this admit unique case) have RClone start trying to transfers a given number of files (number the system handles in the queue to optimize recalls) and would retry for a very long time (minutes) to get files recalled from archive media and keep the queue of recalls going.

Maybe timeout/retry settings that are very long to deal with slow storage like this and others like Glacier?

What is your rclone version (output from rclone version)

rclone v1.51.0
- os/arch: linux/amd64
- go version: go1.13.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

CentOS 7 64 Bit

Which cloud storage system are you using? (eg Google Drive)

ssh/sftp to a server that mounts the HSM system via NFS.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

 rclone copy -v -P --transfers=16 remote:/HSM-Path `pwd`

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

[brockp@gl-login1 scjp-migrate]$ rclone copy -vv  gl-xfer:/nfs/dataden/flux-support-dd/brockp/scjp-migrate `pwd`
2020/04/29 15:59:12 DEBUG : rclone: Version "v1.51.0" starting with parameters ["rclone" "copy" "-vv" "gl-xfer:/nfs/dataden/flux-support-dd/brockp/scjp-migrate" "/scratch/arc_ts_root/support/brockp/scjp-migrate"]
2020/04/29 15:59:12 DEBUG : Using config file from "/home/brockp/.config/rclone/rclone.conf"
2020/04/29 15:59:12 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59716->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:13 INFO  : Local file system at /scratch/arc_ts_root/support/brockp/scjp-migrate: Waiting for checks to finish
2020/04/29 15:59:13 INFO  : Local file system at /scratch/arc_ts_root/support/brockp/scjp-migrate: Waiting for transfers to finish
2020/04/29 15:59:13 DEBUG : migrate.3gb: Starting multi-thread copy with 4 parts of size 768M
2020/04/29 15:59:13 DEBUG : preAllocate: got error on fallocate, trying combination 1/2: operation not supported
2020/04/29 15:59:13 DEBUG : preAllocate: got error on fallocate, trying combination 2/2: operation not supported
2020/04/29 15:59:13 DEBUG : migrate.3gb.10: Starting multi-thread copy with 4 parts of size 768M
2020/04/29 15:59:13 DEBUG : preAllocate: got error on fallocate, trying combination 2/2: operation not supported
2020/04/29 15:59:13 DEBUG : migrate.3gb.100: Starting multi-thread copy with 4 parts of size 768M
2020/04/29 15:59:13 DEBUG : preAllocate: got error on fallocate, trying combination 1/2: operation not supported
2020/04/29 15:59:13 DEBUG : preAllocate: got error on fallocate, trying combination 2/2: operation not supported
2020/04/29 15:59:13 DEBUG : migrate.3gb.1: Starting multi-thread copy with 4 parts of size 768M
2020/04/29 15:59:13 DEBUG : migrate.3gb: multi-thread copy: stream 4/4 (2415919104-3221225472) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.100: multi-thread copy: stream 4/4 (2415919104-3221225472) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.100: multi-thread copy: stream 2/4 (805306368-1610612736) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb: multi-thread copy: stream 1/4 (0-805306368) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.1: multi-thread copy: stream 4/4 (2415919104-3221225472) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.10: multi-thread copy: stream 4/4 (2415919104-3221225472) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb: multi-thread copy: stream 3/4 (1610612736-2415919104) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.100: multi-thread copy: stream 1/4 (0-805306368) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.100: multi-thread copy: stream 3/4 (1610612736-2415919104) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.1: multi-thread copy: stream 1/4 (0-805306368) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.1: multi-thread copy: stream 2/4 (805306368-1610612736) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.1: multi-thread copy: stream 3/4 (1610612736-2415919104) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb: multi-thread copy: stream 2/4 (805306368-1610612736) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.10: multi-thread copy: stream 1/4 (0-805306368) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.10: multi-thread copy: stream 2/4 (805306368-1610612736) size 768M starting
2020/04/29 15:59:13 DEBUG : migrate.3gb.10: multi-thread copy: stream 3/4 (1610612736-2415919104) size 768M starting
2020/04/29 15:59:13 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47328->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:13 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59720->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:13 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47332->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:13 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59724->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:13 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47336->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59728->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47340->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59732->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47344->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59736->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47348->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59740->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47352->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:59744->141.211.192.36:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:14 DEBUG : sftp://brockp@greatlakes-xfer.arc-ts.umich.edu:22//nfs/dataden/flux-support-dd/brockp/scjp-migrate: New connection 141.211.192.38:47356->141.211.192.37:22 to "SSH-2.0-OpenSSH_7.4"
2020/04/29 15:59:16 DEBUG : migrate.3gb.100: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:16 DEBUG : migrate.3gb.100: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:16 DEBUG : migrate.3gb.100: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:16 DEBUG : migrate.3gb.100: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.1: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.1: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.1: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.1: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.10: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.10: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.10: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
2020/04/29 15:59:17 DEBUG : migrate.3gb.10: Reopening on read failure after 0 bytes: retry 1/10: sftp: "Failure" (SSH_FX_FAILURE)
^C

That is an awfully generic error unfortunately - sftp gives those for all sorts of reasons. As far as I know it is just a error code with no extra metadata :frowning:

You can control how many files rclone can transfer at once with --transfers. You can also control the retries with --low-level-retries - maybe you can set that large enough so that rclone waits long enough for the transfer to start?

Note that rclone is using more transfers as it is doing multithread downloads - you might like to disable that with --multi-thread-streams 0

I think if you try tweaking as above it might work perhaps with the caveat that rclone might try the retries too quickly - not 100% sure!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.