Rclone check with files-from is missing some files

What is the problem you are having with rclone?

I am using rclone to check two S3 buckets after migration.
And i find that rclone may miss some files when do rclone check.

Run the command 'rclone version' and share the full output of the command.

root@sds5 ~/x/t/tmp [1]# rclone version
rclone v1.57.0

  • os/version: centos 7.8.2003 (64 bit)
  • os/kernel: 3.10.0-1127.el7.x86_64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.17.2
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Ceph Storage

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone check 55:/bk-1/ 55:/bk-2/ --files-from files --no-traverse

The rclone config contents with secrets removed.

Paste config here

A log from the command with the -vv flag

Objects in the src:
root@sds5 ~/x/t/tmp# rclone ls 55://bk-1
        0 a
        0 a/b
        0 a/c

Objects in the destination:
root@sds5 ~/x/t/tmp# rclone ls 55://bk-2
        0 a
        0 a/b
        0 a/c

Specified files:
root@sds5 ~/x/t/tmp# cat files
a
a/b
a/c

rclone check with files-from and no-traverse only check one object
While rclone check with files-from and no no-traverse option check three objects.

root@sds5 ~/x/t/tmp [1]# rclone check 55:/bk-1/ 55:/bk-2/ --files-from files --no-traverse -vv
2022/02/26 00:07:51 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "check" "55:/bk-1/" "55:/bk-2/" "--files-from" "files" "--no-traverse" "-vv"]
2022/02/26 00:07:51 DEBUG : Creating backend with remote "55:/bk-1/"
2022/02/26 00:07:51 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/02/26 00:07:51 DEBUG : fs cache: renaming cache item "55:/bk-1/" to be canonical "55:bk-1"
2022/02/26 00:07:51 DEBUG : Creating backend with remote "55:/bk-2/"
2022/02/26 00:07:51 DEBUG : fs cache: renaming cache item "55:/bk-2/" to be canonical "55:bk-2"
2022/02/26 00:07:51 INFO  : Using md5 for hash comparisons
2022/02/26 00:07:51 DEBUG : S3 bucket bk-2: Waiting for checks to finish
2022/02/26 00:07:51 DEBUG : a: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:07:51 DEBUG : a: OK
2022/02/26 00:07:51 NOTICE: S3 bucket bk-2: 0 differences found
2022/02/26 00:07:51 NOTICE: S3 bucket bk-2: 1 matching files
2022/02/26 00:07:51 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.0s

2022/02/26 00:07:51 DEBUG : 15 go routines active


root@sds5 ~/x/t/tmp# rclone check 55:/bk-1/ 55:/bk-2/ --files-from files -vv
2022/02/26 00:08:15 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "check" "55:/bk-1/" "55:/bk-2/" "--files-from" "files" "-vv"]
2022/02/26 00:08:15 DEBUG : Creating backend with remote "55:/bk-1/"
2022/02/26 00:08:15 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/02/26 00:08:15 DEBUG : fs cache: renaming cache item "55:/bk-1/" to be canonical "55:bk-1"
2022/02/26 00:08:15 DEBUG : Creating backend with remote "55:/bk-2/"
2022/02/26 00:08:15 DEBUG : fs cache: renaming cache item "55:/bk-2/" to be canonical "55:bk-2"
2022/02/26 00:08:15 INFO  : Using md5 for hash comparisons
2022/02/26 00:08:15 DEBUG : S3 bucket bk-2: Waiting for checks to finish
2022/02/26 00:08:15 DEBUG : a: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:08:15 DEBUG : a: OK
2022/02/26 00:08:15 DEBUG : a/c: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:08:15 DEBUG : a/c: OK
2022/02/26 00:08:15 DEBUG : a/b: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:08:15 DEBUG : a/b: OK
2022/02/26 00:08:15 NOTICE: S3 bucket bk-2: 0 differences found
2022/02/26 00:08:15 NOTICE: S3 bucket bk-2: 3 matching files
2022/02/26 00:08:15 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 3 / 3, 100%
Elapsed time:         0.0s

2022/02/26 00:08:15 DEBUG : 7 go routines active


No files-from option also works fine.
root@sds5 ~/x/t/tmp# rclone check 55:/bk-1/ 55:/bk-2/  -vv
2022/02/26 00:09:05 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "check" "55:/bk-1/" "55:/bk-2/" "-vv"]
2022/02/26 00:09:05 DEBUG : Creating backend with remote "55:/bk-1/"
2022/02/26 00:09:05 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/02/26 00:09:05 DEBUG : fs cache: renaming cache item "55:/bk-1/" to be canonical "55:bk-1"
2022/02/26 00:09:05 DEBUG : Creating backend with remote "55:/bk-2/"
2022/02/26 00:09:05 DEBUG : fs cache: renaming cache item "55:/bk-2/" to be canonical "55:bk-2"
2022/02/26 00:09:05 INFO  : Using md5 for hash comparisons
2022/02/26 00:09:05 DEBUG : S3 bucket bk-2: Waiting for checks to finish
2022/02/26 00:09:05 DEBUG : a: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:09:05 DEBUG : a: OK
2022/02/26 00:09:05 DEBUG : a/c: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:09:05 DEBUG : a/b: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/02/26 00:09:05 DEBUG : a/b: OK
2022/02/26 00:09:05 DEBUG : a/c: OK
2022/02/26 00:09:05 NOTICE: S3 bucket bk-2: 0 differences found
2022/02/26 00:09:05 NOTICE: S3 bucket bk-2: 3 matching files
2022/02/26 00:09:05 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 3 / 3, 100%
Elapsed time:         0.0s

2022/02/26 00:09:05 DEBUG : 7 go routines active


So i want to know:

1. What's the usage of no-traverse option. I think the default action is no traverse the destination when using files-from.
2. Why the result is different when using no-traverse and not?

The situation is that i have a large bucket and migrate it to the destination. I have colleceted all the objects under the bucket and i want to do data integrity check after migration without listing the objects again.

Could you help guide the correct way to do this?

Thanks a lot.

I think the problem might be that you have a file called "a" and also a directory called "a" in your example.

Can you try without the file called "a"?

The situation is that i have a large bucket and migrate it to the destination. I have colleceted all the objects under the bucket and i want to do data integrity check after migration without listing the objects again.

Listing the bucket is much cheaper than HEADing each object if you are checking all the objects - that is why --no-traverse is not the default.

Thanks @ncw .

  1. Yes, other objects without the prefix a are OK. Is this a bug?
  2. Thanks for the explanation. But my situation maybe is a little different. The first reason is that listing the bucket may consume a large number of memory when my bucket contains tens of millions of objects or even more. The second reason is that i could use an internal way to quickly get all the object keys(only keys, no etag and size).
    So given the two conditions. maybe it's a better way to use files-from and no-traverse option here?

One more question:
From rclone documentaion, no-traverse option is described as " The --no-traverse flag controls whether the destination file system is traversed when using the copy or move commands."

I guess here the destination file system not only refers to the destination remote, but also the source remote, right?

I hesitate to say it is a bug, more of a fact of rclone trying to squeeze a key value store into something that looks like a file system.

Can you filter your list and remove all the "directory"-like objects so a in this example?

As long as you don't mind the HEAD requests for each object then it is fine.

--no-traverse only applies to the destination usually.

However ff you are using --files-from and --no-traverse this bit from the filtering docs comes into play. This applies to both source and destination I think.

If the --no-traverse and --files-from flags are used together an rclone command does not traverse the remote. Instead it addresses each path/file named in the file individually. For each path/file name, that requires typically 1 API call. This can be efficient for a short --files-from list and a remote containing many files.

After removing the "directory"-like objects in the files-from, it works fine.

But I would like to check the "directory"-like objects since my bucket may contain this kind of objects.

Good

Are they just directory markers - ie 0 length files? Or do they have real data in?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.