Need better results when copying individual objects/files

@ncw , we can reliably reproduce the issue(s). The main problem (Problem B below) is that "rclone check" when run in a docker container is not exiting in a certain scenario (Problem A below) and keeps running until the docker container is killed.

Problem A: When performing "rclone check" on an backend that is busy updating the version of one of the objects/files, rclone returns an error when the command is run locally (and doesn't return when run in a container):

stderr: b'2022/03/29 14:32:02 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "check" "-vv" "--combined" "-" "--one-way" "--files-from" "/var/spool/caringo/cloudgateway/features/41123165DFF94056/objectFile" "hybridclouds3clonelocalswarm:sourcebucket" "hybridclouds3cloneremote:destbucket"]
...
2022/03/29 14:32:02 DEBUG : fs cache: renaming cache item "hybridclouds3cloneremote:destbucket" to be canonical "hybridclouds3cloneremote{EvD5b}:destbucket"
2022/03/29 14:32:02 INFO : Using md5 for hash comparisons
2022/03/29 14:32:02 DEBUG : S3 bucket destbucket: Waiting for checks to finish
2022/03/29 14:32:55 ERROR : S3 bucket sourcebucket: error reading source root directory: RequestError: send request failed
caused by: Get "http://hctest-20220329-192949-271-tenant0000000500x0000000000-domain1:8082/sourcebucket?delimiter=%2F&max-keys=1000&prefix=": dial tcp: lookup hctest-20220329-192949-271-tenant0000000500x0000000000-domain1 on 172.30.0.20:53: no such host
2022/03/29 14:32:55 NOTICE: S3 bucket destbucket: 1 differences found
2022/03/29 14:32:55 NOTICE: S3 bucket destbucket: 1 errors while checking
2022/03/29 14:32:55 INFO :
Transferred: \t 0 B / 0 B, -, 0 B/s, ETA -
Errors: 1 (retrying may help)
Elapsed time: 53.0s

This is not the best experience, but I think I can prevent this scenario with one or more of the following 1) Add the "--no-traverse" option, since it is not an object from the "--files-from" specification 2) Wait to run check until after the object is finished writing to the backend 3) Add "--retries" or "--low-level-retries" with "--low-level-retries"

Do "--retries" and "--low-level-retries" with the "check" command?

Problem B: When the "rclone check" is run in a container, and experiences Problem A, the process never exits and runs indefinitely. This causes a process to hang when it is waiting on the container to be released.
The process can be seen when running the command
ps -ef | grep docker:

root 16272 1 1 12:57 ? 00:00:04 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
root 16955 16496 0 13:01 ? 00:00:00 docker run --rm --add-host
...
env RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_CHUNK_SIZE=100M rclone/rclone check --combined - --one-way --files-from /path/objectFile hybridclouds3clonelocalswarm:sourcebucket hybridclouds3cloneremote:destbucket -vv

Let's try to fix the not returning in a container problem:

Can you get it to lock up, and when it has done so, enter the container and issue kill -QUIT to kill rclone. This will print a backtrace to stderr which you'll need to collect and post here. This will show exactly what rclone is doing.

Cloud storage systems are inherently not 100% reliable - there are too many moving parts, eventual consistency, internet etc.

This is why rclone has --retries and --low-level-retries.

--low-level-retries work, and unless you turned them off, you'll be getting 10 of them.

--retries do not work with rclone check as if there is a check problem usually retrying it won't make it go away.

However you could retry in shell script if you wanted.

Can you also try this with rclone 1.58 which may behave differently when the error is reported.

Sorry, I posted before I fully understood the scenario. The rclone check process actually did complete and return and the container finished. What is remaining and not returning is the docker run process that originally created the container to run the rclone command. I killed the docker run process and now am able to see the returncode, stdout, and stderr from the rclone check. I think the problem might be with the returncode, which is "-9" in this case. (Sorry, it is not actually hitting the "no such host" scenario in this case after all.)

What is returncode "-9"?

Here are the values from the rclone process (command, returncode, stdout, and stderr) with some of the boring parts removed:

rclone command: ['docker', 'run', '--rm', '--add-host', 'domain.mb1-84.tx.caringo.com:172.30.15.84', '--volume', '/var/spool/caringo/cloudgateway/features/2CDD94B38F00
0D56:/var/spool/caringo/cloudgateway/features/2CDD94B38F000D56', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_TYPE=s3', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CL
ONELOCALSWARM_ACCESS_KEY_ID=(removed)', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_SECRET_ACCESS_KEY=(removed)
EZ23VZS', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT=http://domain.mb1-84.tx.caringo.com:8082', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM
_CHUNK_SIZE=100M', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_TYPE=s3', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ACCESS_KEY_ID=(removed)
', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_SECRET_ACCESS_KEY=(removed)', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE
_ENDPOINT=http://domain.mb1-84.tx.caringo.com:8082', '--env', 'RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_CHUNK_SIZE=100M', 'rclone/rclone', 'check', '--combined', '-', '-
-one-way', '--files-from', '/var/spool/caringo/cloudgateway/features/2CDD94B38F000D56/objectFile', 'hybridclouds3clonelocalswarm:bigbucket', 'hybridclouds3cloneremote:
destination', '-vv']
returncode: -9
stdout: b'= so0000000003\n= so0000000000\n= so0000000007\n

...

= so0000000391\n= so0000000392\n= so0000000393\n= so0000000394\n= so0000000398\n'
stderr: b'2022/03/30 16:04:05 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "check" "--combined" "-" "--one-way" "--files-from" "/var/spool/ca
ringo/cloudgateway/features/2CDD94B38F000D56/objectFile" "hybridclouds3clonelocalswarm:bigbucket" "hybridclouds3cloneremote:destination" "-vv"]\n2022/03/30 16:04:05 DE
BUG : Creating backend with remote "hybridclouds3clonelocalswarm:bigbucket"\n2022/03/30 16:04:05 DEBUG : Setting type="s3" for "hybridclouds3clonelocalswarm" from envi
ronment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_TYPE\n2022/03/30 16:04:05 DEBUG : Setting access_key_id="(removed)" for "hybridcloud
s3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ACCESS_KEY_ID\n2022/03/30 16:04:05 DEBUG : Setting secret_access_key="(removed)
" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_SECRET_ACCESS_KEY\n2022/03/30
16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CL
ONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3clonelocalswarm" from environment var
iable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cl
onelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.car
ingo.com:8082" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting chunk_size="100M" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_CHUNK_SIZE\n2022/03/30 16:04:05 DEBUG : hybridclouds3clonelocalswarm: detected overridden config - adding "{Vsc47}" suffix to name\n2022/03/30 16:04:05 NOTICE: Config file "/config/rclone/rclone.conf" not found - using defaults\n2022/03/30 16:04:05 DEBUG : Setting access_key_id="(removed)" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ACCESS_KEY_ID\n2022/03/30 16:04:05 DEBUG : Setting secret_access_key="(removed)" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_SECRET_ACCESS_KEY\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting chunk_size="100M" for "hybridclouds3clonelocalswarm" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONELOCALSWARM_CHUNK_SIZE\n2022/03/30 16:04:05 DEBUG : fs cache: renaming cache item "hybridclouds3clonelocalswarm:bigbucket" to be canonical "hybridclouds3clonelocalswarm{Vsc47}:bigbucket"\n2022/03/30 16:04:05 DEBUG : Creating backend with remote "hybridclouds3cloneremote:destination"\n2022/03/30 16:04:05 DEBUG : Setting type="s3" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_TYPE\n2022/03/30 16:04:05 DEBUG : Setting access_key_id="(removed)" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ACCESS_KEY_ID\n2022/03/30 16:04:05 DEBUG : Setting secret_access_key="(removed)" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_SECRET_ACCESS_KEY\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting chunk_size="100M" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_CHUNK_SIZE\n2022/03/30 16:04:05 DEBUG : hybridclouds3cloneremote: detected overridden config - adding "{9H90T}" suffix to name\n2022/03/30 16:04:05 DEBUG : Setting access_key_id="(removed)" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ACCESS_KEY_ID\n2022/03/30 16:04:05 DEBUG : Setting secret_access_key="(removed)" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_SECRET_ACCESS_KEY\n2022/03/30 16:04:05 DEBUG : Setting endpoint="http://domain.mb1-84.tx.caringo.com:8082" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_ENDPOINT\n2022/03/30 16:04:05 DEBUG : Setting chunk_size="100M" for "hybridclouds3cloneremote" from environment variable RCLONE_CONFIG_HYBRIDCLOUDS3CLONEREMOTE_CHUNK_SIZE\n2022/03/30 16:04:05 DEBUG : fs cache: renaming cache item "hybridclouds3cloneremote:destination" to be canonical "hybridclouds3cloneremote{9H90T}:destination"\n2022/03/30 16:04:05 INFO : Using md5 for hash comparisons\n2022/03/30 16:04:05 DEBUG : S3 bucket destination: Waiting for checks to finish\n2022/03/30 16:04:06 DEBUG : test1: Excluded\n2022/03/30 16:04:06 DEBUG : test1-1648656192076: Excluded\n2022/03/30 16:04:06 DEBUG : test1.status: Excluded\n2022/03/30 16:04:06 DEBUG : DancingLadies.JPG: Excluded\n2022/03/30 16:04:06 DEBUG : UF_Graduation.mp4: Excluded\n2022/03/30 16:04:06 DEBUG : copyme.txt: Excluded\n2022/03/30 16:04:06 DEBUG : test1: Excluded\n2022/03/30 16:04:06 DEBUG : test1-1646858333760: Excluded\n2022/03/30 16:04:06 DEBUG : test1-1646861923702: Excluded\n2022/03/30 16:04:06 DEBUG : test1-1646962009945: Excluded\n2022/03/30 16:04:06 DEBUG : test1-1647017920053: Excluded\n2022/03/30 16:04:06 DEBUG : test1-1647625076585: Excluded\n2022/03/30 16:04:06 DEBUG : test1.241B070E5D54B44C.json: Excluded\n2022/03/30 16:04:06 DEBUG : test1.47371D82F841B4C1.json: Excluded\n2022/03/30 16:04:06 DEBUG : test1.4B4E8FD708625F89.json: Excluded\n2022/03/30 16:04:06 DEBUG : test1.status: Excluded\n2022/03/30 16:04:06 DEBUG : test2: Excluded\n2022/03/30 16:04:06 DEBUG : test2-1646858949992: Excluded\n2022/03/30 16:04:06 DEBUG : test2-1646862183482: Excluded\n2022/03/30 16:04:06 DEBUG : test2-1646962525153: Excluded\n2022/03/30 16:04:06 DEBUG : test2-1647026807213: Excluded\n2022/03/30 16:04:06 DEBUG : test2-1647626018385: Excluded\n2022/03/30 16:04:06 DEBUG : test2.40764EB8748AC7EC.json: Excluded\n2022/03/30 16:04:06 DEBUG : test2.status: Excluded\n2022/03/30 16:04:06 DEBUG : test3: Excluded\n2022/03/30 16:04:06 DEBUG : test3-1646860900199: Excluded\n2022/03/30 16:04:06 DEBUG : test3-1646959200107: Excluded\n2022/03/30 16:04:06 DEBUG : test3-1647013758474: Excluded\n2022/03/30 16:04:06 DEBUG : test3-1647027649912: Excluded\n2022/03/30 16:04:06 DEBUG : test3.AE4CD67B654C1A9C.json: Excluded\n2022/03/30 16:04:06 DEBUG : test3.status: Excluded\n2022/03/30 16:04:06 DEBUG : test4: Excluded\n2022/03/30 16:04:06 DEBUG : test4-1646861117889: Excluded\n2022/03/30 16:04:06 DEBUG : test4-1646960336263: Excluded\n2022/03/30 16:04:06 DEBUG : test4-1647014910812: Excluded\n2022/03/30 16:04:06 DEBUG : test4-1647030739334: Excluded\n2022/03/30 16:04:06 DEBUG : test4.942B2B4BE1D16C55.json: Excluded\n2022/03/30 16:04:06 DEBUG : test4.FFE15CB632C3D29A.json: Excluded\n2022/03/30 16:04:06 DEBUG : test4.status: Excluded\n2022/03/30 16:04:06 DEBUG : test5: Excluded\n2022/03/30 16:04:06 DEBUG : test5-1646861493123: Excluded\n2022/03/30 16:04:06 DEBUG : test5-1646960911495: Excluded\n2022/03/30 16:04:06 DEBUG : test5-1647017739798: Excluded\n2022/03/30 16:04:06 DEBUG : test5-1647623197755: Excluded\n2022/03/30 16:04:06 DEBUG : test5.096761824EB99D5E.json: Excluded\n2022/03/30 16:04:06 DEBUG : test5.D74262F7FB53377E.json: Excluded\n2022/03/30 16:04:06 DEBUG : test5.status: Excluded\n2022/03/30 16:04:06 DEBUG : so0000000003: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000000: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000007: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000004: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000007: OK\n2022/03/30 16:04:06 DEBUG : so0000000004: OK\n2022/03/30 16:04:06 DEBUG : so0000000003: OK\n

...

2022/03/30 16:04:06 DEBUG : so0000000355: OK\n2022/03/30 16:04:06 DEBUG : so0000000353: OK\n2022/03/30 16:04:06 DEBUG : so0000000359: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000358: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000359: OK\n2022/03/30 16:04:06 DEBUG : so0000000361: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000352: OK\n2022/03/30 16:04:06 DEBUG : so0000000360: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000354: OK\n2022/03/30 16:04:06 DEBUG : so0000000363: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000358: OK\n2022/03/30 16:04:06 DEBUG : so0000000362: md5 = d41d8cd98f00b204e9800998ecf8427e OK\n2022/03/30 16:04:06 DEBUG : so0000000361: OK\n2022/03/30 16:04:06 DEBUG : so0000000363: OK\n2022/03/30 16:04:06 DEBUG : so0000000365: md5 = d41d8cd98f00b204e9800998ecf84'

hi, that is documented
https://rclone.org/docs/#error-on-no-transfer
https://rclone.org/docs/#exit-code

I'm familiar with positive 9. I've gotten that when I run copy with "--error-on-no-transfer". However, this is negative 9. I'm running check without the "--error-on-no-transfer" flag.

More info: The check command did not complete before exiting with error code -9. When I look through the check result for all 500 objects, I see that I only got the result for half to three quarters of them. I also expect stderr (with -vv verbosity) to end with "go routines active\n" and it ended with "so0000000365: md5 = d41d8cd98f00b204e9800998ecf84" instead. Also note the md5 was cut off there; would expect three more characters: d41d8cd98f00b204e9800998ecf8427e.

ok, i see your point.

The -9 is printed from my python process. We think the -9 is unrelated to the rclone exit code, it's just that I was using SIGKILL kill -9 on the docker run process and python returns -{SIGNAL NUMBER} when it's terminated by a signal.

https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.returncode

good, i think you answered your own question, the exit code is not from rclone itself.
just a return code from kill -9

Sorry, this looks to be an issue with Docker containers, or containers usage, and not rclone. I got the issue to happen with a second copy command in place of the check.

Let us know if you need more rclone specific help. I can't be much help with containers though as I only know the minimum to get by there!

FYI: Root caused! The latest reproducible version of this issue was a Python user error and not a Docker or rclone error. I was letting the stdout/stderr pipes fill up and the Python subprocess.Popen was waiting for them to clear up before continuing, but I had no clearing mechanism (like communicate()). The reason it happened on my call to "rclone check" and not "rclone copy" was because my call to copy was piping stdout/stderr to a file with option "--log-file".

It's still a mystery why we saw this intermittently on our old environment which was running subprocess.run (which does periodically clear std/err buffers) instead of subprocess.Popen, but I know what to look for next time I see this issue. Rclone check has more rapid output to stdout and stderr than other commands, so it could push the boundaries of some environments and the calling mechanism should account for that.

1 Like

Glad you got it sorted and thanks for the update :slight_smile:

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.