Rclone check reports missing files that are present

I’m using rclone to copy a set of directories from a local file system to a mounted Swift Proxy FS file system. After the copy is complete, I run the check command to verify everything goes over correctly. Unfortunately, the check is reporting “File not in Local file system” even though I can verify that the files are in both file systems. I’ve tried copying the files by hand and end up with the same results.

Steps to reproduce

  1. Run rclone copy /opt/sourceDir /opt/targetDir -c --skip-links -v --log-file copy.log
  2. Run rclone check /opt/sourceDir /opt/targetDir -c --skip-links -v --log-file check.log
  3. Review check.log and see “ERROR somefile.txt: File not in Local file system at /opt/targetDir”
  4. Run ls -l /opt/sourceDir/path/to/somefile.txt /opt/targetDir/path/to/somefile.txt and see the same file in both directories with the same file size.
  5. Run sha1sum /opt/sourceDir/path/to/somefile.txt /opt/targetDir/path/to/somefile.txt and see the same checksum for both files.

Additional information:

rclone v1.40

  • os/arch: linux/amd64
  • go version: go1.10

I’m not sure what else I can include to be helpful - please let me know. We would like to use rclone to copy several TB of smallish files and the check command is critical to the process.

Joe

What is providing that mounted FS? Is it rclone too?

Could it be a case sensitivity problem? If copying from a case sensitive fs you can have “HELLO.txt” and “hello.txt” but the destination (if case insensitive) might merge them into one file which would explain your results.

Is it possible you could post a log of both the rclone commands with -vv? Or if you want to keep them private, zip them and email them to me at nick@craig-wood.com with a link to this forum post please!

Thanks for getting back to me Nick.

Swift has a new Proxy FS feature that allows you to mount one of your Swift accounts as a local file system. We are using this feature to allow our legacy systems to continue to operate while we move our data to an object based store.

I’ve tried running a couple of different scenarios trying to figure out what the differences are. Here are the things I’ve tried and noticed:

  • The directories that have reported check errors have more than 80 files in the subdirectories.
  • There are no case sensitivity issues: the file names, sizes, and checksums are the same. I can also view the contents of the file from each directory with no problems.
  • Removing the whole directory and re-running rclone copy then rclone check shows an error on the same file every time.
  • Removing the file from the target directory and re-running copy gives unusual -vv output. I was expecting to only see the one file being copied, but it was that file and the “next” in sequence that listed “Copied (new)”

rclone-copy-oneFile-take2.log - this was after removing just the screen/fax000132803-0088.png file from the target directory:

2018/04/03 10:59:26 DEBUG : screen/fax000132803-0019.png: Unchanged skipping
2018/04/03 10:59:26 INFO  : screen/fax000132803-0088.png: Copied (new)
2018/04/03 10:59:26 INFO  : screen/fax000132803-0089.png: Copied (new)
2018/04/03 10:59:26 DEBUG : screen/fax000132803-0020.png: Size and MD5 of src and dst objects identical

It doesn’t look like I can upload the full logs here, so I will send them to you directly. Hopefully, you see something that can help.

Thanks!

That is strange… And it is ringing a vague bell - last time I saw something like that it was to do with duplicated file names

Can you send me the result of rclone --skip-links ls /opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ and rclone --skip-links ls /opt/ss_cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ and also the results of just a plain ls on those two directories?

BTW is this what you are using? https://github.com/swiftstack/ProxyFS

HI Nick,

I have run the commands you requested as well as a sha1sum of the files and will send the results to you via email. The checksums match on all files.

find /opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ -type f -exec sha1sum {} +
find /opt/ss_cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ -type f -exec sha1sum {} +

I also did a rclone check where I reversed the order of the arguments (dest src vs src dest) and did not see any difference. It is still reporting that file missing from the destination.

Looking through the rclone ls results, I see that the file is listed in the source results, but not in the destination results. I ran both commands several times with the same results. Regular ls shows the files in both source and destination. The only real difference that I see between the regular listings is that the owner and group change.

We are using the commercial version of Swift and I believe the ProxyFS link is to the open source version of what we are using. I couldn’t tell you if there are any differences between the commercial version and the open source version. That said, I have also reached out to the Swift team who is interested in this issue and are trying to replicate things on their end.

Thanks!

Hmm, yes, that is quite convincingly the cause of the problem

$ diff -u rclone-ls-cmsdata-take1.log rclone-ls-ss_cmsdata-take1.log 
--- rclone-ls-cmsdata-take1.log	2018-04-04 13:56:02.000000000 +0100
+++ rclone-ls-ss_cmsdata-take1.log	2018-04-04 13:56:02.000000000 +0100
@@ -82,7 +82,6 @@
     58254 screen/fax000132803-0085.png
     32172 screen/fax000132803-0086.png
     62341 screen/fax000132803-0087.png
-    19809 screen/fax000132803-0088.png
     60051 screen/fax000132803-0089.png
     35474 screen/fax000132803-0090.png
     55331 screen/fax000132803-0091.png

This could be an rclone problem a go standard library problem or a SwiftFS problem…

It would be interesting to try stracing a list of src and dst

strace -f rclone ls /path/to/files 2>&1 | grep dent

This should produce something like this…

[pid  3279] getdents64(3,  <unfinished ...>
[pid  3279] <... getdents64 resumed> /* 3 entries */, 4096) = 80
[pid  3279] getdents64(3, /* 0 entries */, 4096) = 0
[pid  3279] getdents64(3, /* 0 entries */, 4096) = 0
[pid  3285] getdents64(3, /* 86 entries */, 4096) = 4080
[pid  3285] getdents64(3, /* 80 entries */, 4096) = 3840
[pid  3285] getdents64(3, /* 0 entries */, 4096) = 0
[pid  3285] getdents64(3, /* 0 entries */, 4096) = 0

The first few are something else, but the 86 entries and 80 entries are the actual listing data (80+86 = 166 which is 164 + ‘.’ and ‘…’)

What do you get? That will tell us whether the kernel actually returned the right number of entries or not which will point the finger at either ProxyFS or rclone/go stdlib.

This would be interesting too for src and dst:

strace -f rclone ls /path/to/files 2>&1 | grep fax000132803-0088.png

It should produce something like

[pid  3760] lstat("/tmp/src/screen/fax000132803-0088.png", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
[pid  3760] write(1, "        6 screen/fax000132803-00"..., 39        6 screen/fax000132803-0088.png

Outputs small enough to share here :slight_smile:

strace -f rclone ls /opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ 2>&1 | grep dent

[pid   569] getdents64(3, /* 3 entries */, 4096) = 80
[pid   569] getdents64(3, /* 0 entries */, 4096) = 0
[pid   569] getdents64(3, /* 0 entries */, 4096) = 0
[pid   569] getdents64(3, /* 86 entries */, 4096) = 4080
[pid   569] getdents64(3, /* 80 entries */, 4096) = 3840
[pid   569] getdents64(3, /* 0 entries */, 4096) = 0
[pid   569] getdents64(3, /* 0 entries */, 4096) = 0 

strace -f rclone ls /opt/ss_cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ 2>&1 | grep dent

[pid   767] getdents64(3, /* 3 entries */, 4096) = 80
[pid   767] getdents64(3, /* 0 entries */, 4096) = 0
[pid   767] getdents64(3, /* 0 entries */, 4096) = 0
[pid   767] getdents64(3,  <unfinished ...>
[pid   767] <... getdents64 resumed> /* 86 entries */, 4096) = 4080
[pid   767] getdents64(3,  <unfinished ...>
[pid   767] <... getdents64 resumed> /* 41 entries */, 4096) = 1968
[pid   767] getdents64(3,  <unfinished ...>
[pid   767] <... getdents64 resumed> /* 38 entries */, 4096) = 1824
[pid   767] getdents64(3,  <unfinished ...>
[pid   767] <... getdents64 resumed> /* 0 entries */, 4096) = 0
[pid   767] getdents64(3, /* 0 entries */, 4096) = 0

strace -f rclone ls /opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ 2>&1 | grep fax000132803-0088.png

[pid  1048] lstat("/opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/screen/fax000132803-0088.png", {st_mode=S_IFREG|0644, st_size=19809, ...}) = 0
[pid  1048] write(1, "    19809 screen/fax000132803-00"..., 39    19809 screen/fax000132803-0088.png

strace -f rclone ls /opt/ss_cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ 2>&1 | grep fax000132803-0088.png produces nothing.

I don’t quite understand the output, but I’m guessing that there is some sort of buffering or pause and resume going on in the ss_cmsdata dent trace.

I’m not sure about ehther the unfinished/resume is important or not - system calls are allowed to get interrupted and resumed.

However what is important is that the number of directory entries returned was different. For the first it was 86+80 = 166 and for the second it was 86+41+38 = 165 so a directory entry was missing.

This means that it probably isn’t rclone’s bug. It could still be the go runtime, the kernel or ProxyFS.

I mashed together a little C program which calls the getdents64 system call in exactly the same way go does (ls, find, python call the getdents syscall not the getdents64 syscall).

If you run it like this ./getdents64 /path it will print a sort of directory listing - see if it contains the missing file or not.

$ ./getdents64 /tmp/src/screen/ | grep fax000132803-0088.png
 6303570  ???          48 7322309755386057212  fax000132803-0088.png

And running it like this will show whether it gets the same number of directory entries for the source or dest

$ strace -f ./getdents64 /tmp/src/screen/ 2>&1 | grep dent
execve("./getdents64", ["./getdents64", "/tmp/src/screen/"], [/* 63 vars */]) = 0
getdents64(3, /* 86 entries */, 4096)   = 4080
getdents64(3, /* 80 entries */, 4096)   = 3840
getdents64(3, /* 0 entries */, 4096)    = 0

You can find the program and source here https://www.craig-wood.com/nick/pub/getdents64.zip

I’ve just found an open go issue about this: https://github.com/golang/go/issues/24015

From that issue it looks like it is a linux kernel bug rather than anything else.

It would be interesting if you compare your kernel versions with those in that thread too.

I tried your program and did not get a result:

./getdents64 /opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/screen | grep fax000132803-0088.png
403349240  ???          48        447  fax000132803-0088.png

./getdents64 /opt/ss_cmsdata/01/30/b2b407384101a5ab7f610a9eb600/screen | grep fax000132803-0088.png
# no result

Out of curiosity I wanted to know why ls works but rclone doesn’t so ran the strace command on it. Turns out it’s using getdents() instead of getdents64() under the covers:

strace -f ls -lR /opt/cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ 2>&1 | grep dent
getdents(3, /* 3 entries */, 32768)     = 80
getdents(3, /* 0 entries */, 32768)     = 0
getdents(3, /* 166 entries */, 32768)   = 7920
getdents(3, /* 0 entries */, 32768)     = 0

strace -f ls -lR /opt/ss_cmsdata/01/30/b2b407384101a5ab7f610a9eb600/ 2>&1 | grep dent
getdents(3, /* 3 entries */, 32768)     = 80
getdents(3, /* 0 entries */, 32768)     = 0
getdents(3, /* 128 entries */, 32768)   = 6096
getdents(3, /* 38 entries */, 32768)    = 1824
getdents(3, /* 0 entries */, 32768)     = 0

I see your comment about a kernel bug on the golang board and will see if there is anything fruitful there.

Thanks!