Listing Speed Issue

ssalaues · April 8, 2019, 10:53pm

I have a specific use-case in which I setup RClone to take an NFS server mount at lets say /nfs and sync to our local network S3 compatible endpoint. Typically this use case has been incredibly fast and efficient on resources (kudos on the awesome program). However over the weekend I encountered a specific scenario where I have an nfs mount with no nested folders and 500,000 files which causes RClone to seemingly hang for almost 4 hours before the sync actually starts.

Originally I was using v1.45 but tested with 1.46 and latest beta build v1.46.0-112-g1c301f9f-beta to see if the newer versions fixed the issue with no luck.

The original command itself is:

rclone sync \
      source:/nfs \
      destination:nfs-test \
      --s3-region=nfs-sync \
      --checkers=64 \
      --s3-upload-concurrency=32 \
      --transfers=32 \
      -v

rclone.conf:

    [source]
    copy_links = false
    nounc = true
    type = local

    [destination]
    acl = bucket-owner-full-control
    bucket = nfs-test
    endpoint = http://local-endpoint
    env_auth = true
    provider = Other
    region = nfs-sync
    type = s3

Steps to Reproduce issue

I believe I’ve narrowed it down to a listing issue by simplifying the command. Doing this I was able to reproduce the same behavior in my local environment.

Creating an nfs export
- I used these options:
  /mnt/nfs-test/ *(rw,sync,fsid=1,no_subtree_check)
Mounting the NFS:
- mount -t nfs 127.0.0.1:/mnt/nfs-test /nfs
Created ~128GB in 500,000 dummy files via
dd if=/dev/urandom bs=274432 count=500000 | split -a 6 -b 268k - /mnt/nfs-test/file.

Run rclone ls:

$ ./rclone ls source:/nfs -vvv
2019/04/08 20:19:38 DEBUG : rclone: Version "v1.46.0-112-g1c301f9f-beta" starting with parameters ["./rclone" "ls" "source:/nfs" "-vvv"]
2019/04/08 20:19:38 DEBUG : Using config file from "/home/ubuntu/.config/rclone/rclone.conf"

It has the appearance that the process is hung but will eventually start showing an output (maybe hours later). The issue seems to be specifically related to how RClone is reading the directory but I’m not super familiar with the code base. I could see on nfsstat -s that during this long pause there are heavy amounts of readdir calls going on.

I don’t see the issue with running it directly on the /mnt/nfs-test folder (so not through NFS) but in my case, I do not have access to the actual NFS server in question and only access to the mount itself which is what led me down this path.

I couldn’t immediately find any flags that seemed to help (tried --fast-list but don’t think it works on local). Open to any suggestions or even point me in the right direction for a bugfix/improvement contribution.

ncw · April 9, 2019, 4:49pm

It would be useful if you could strace rclone when it is hung.

What happens if you ls -R the directory and how about ls -lR?

rclone uses the slightly less common getdents64 syscall and I have seen problems with this in the past.

It might be something like this

ssalaues · April 10, 2019, 5:49pm

This the output and time for ls -R

$ time ls -R | head
.:
myobjects000000
myobjects000001
myobjects000002
myobjects000003
myobjects000004

real    135m36.373s
user    0m3.278s
sys     8m10.171s

and for ls -lR

 time ls -lR | head
.:
total 134000000
-rw-r--r-- 1 root root 274432 Apr  5 19:29 myobjects000000
-rw-r--r-- 1 root root 274432 Apr  5 19:29 myobjects000001
-rw-r--r-- 1 root root 274432 Apr  5 19:29 myobjects000002

real    0m56.710s
user    0m3.240s
sys     0m8.133s

It’s really interesting that the ls -lR is significantly faster. I’ll post back with the strace once it’s done.

Animosity022 · April 10, 2019, 5:57pm

The first is not cached and the second is which is why it’s faster.

ssalaues · April 10, 2019, 8:31pm

I went back and did the ls -R and completed in about 5 seconds so that definitely seems to use a cache the second time around.

But it doesn’t seem that ls -lR is affected by the caching because rerunning it yields the same time results. I even tried restarting the nfs service, recreated the export, and mounted on a completely different machine yet still got virtually the same result for ls -lR

time ls -lR | head
.:
total 134000000
-rw-r--r-- 1 root root 274432 Apr  5 19:29 myobjects000000
-rw-r--r-- 1 root root 274432 Apr  5 19:29 myobjects000001
-rw-r--r-- 1 root root 274432 Apr  5 19:29 myobjects000002

real    0m59.337s
user    0m4.924s
sys     0m6.104s

ssalaues · April 10, 2019, 11:09pm

@ncw I can upload the full strace but it’s pretty verbose. What I see is essentially this on repeat after running strace -fT rclone ls source:/nfs

...
10612 lstat("/nfs/myobjects215329", {st_mode=S_IFREG|0644, st_size=274432, ...}) = 0 <0.000072>
10612 lstat("/nfs/myobjects215328", {st_mode=S_IFREG|0644, st_size=274432, ...}) = 0 <0.000078>
10608 <... pselect6 resumed> )          = 0 (Timeout) <0.010068>
10608 epoll_pwait(4, [], 128, 0, NULL, 1424120) = 0 <0.000011>
10608 pselect6(0, NULL, NULL, NULL, {0, 10000000}, NULL <unfinished ...>
10612 mmap(0xc4322b0000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc4322b0000 <0.000017>
10612 mmap(0xc41f6e2000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc41f6e2000 <0.000015>
10612 getdents64(5,  <unfinished ...>
10608 <... pselect6 resumed> )          = 0 (Timeout) <0.010068>
10608 epoll_pwait(4, [], 128, 0, NULL, 1424120) = 0 <0.000011>
10608 pselect6(0, NULL, NULL, NULL, {0, 10000000}, NULL) = 0 (Timeout) <0.010065>
10608 epoll_pwait(4, [], 128, 0, NULL, 1424120) = 0 <0.000010>
10608 futex(0xc4204a6148, FUTEX_WAKE, 1 <unfinished ...>
10614 <... futex resumed> )             = 0 <0.120430>
10608 <... futex resumed> )             = 1 <0.000023>
10614 futex(0xc4204a6148, FUTEX_WAIT, 0, NULL <unfinished ...>
10608 pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL) = 0 (Timeout) <0.000083>
10608 futex(0x19da610, FUTEX_WAIT, 0, {60, 0} <unfinished .
10612 <... getdents64 resumed> /* 102 entries */, 4096) = 4080 <15.888090>
10612 futex(0x19da610, FUTEX_WAKE, 1)   = 1 <0.000103>
10608 <... futex resumed> )             = 0 <15.873985>
10612 getdents64(5,  <unfinished ...>
...

You can see the time it takes for <... getdents64 resumed> /* 102 entries */, 4096) = 4080 is over 15.8 seconds in this call and right after you can see futex resumed after taking another 15.8 seconds.

It definitely seems related to the issue you posted. The old default of 4096 matches up with the buffer sizes shown in strace. It seems like they increased that buffer to 8192 in go 1.11 but I think the default NFS response buffer is 32k anyway. I’ll try mounting my nfs with a lower rsize and report back.

ncw · April 11, 2019, 8:52am

Wow! Yes that looks like it is the problem.

OK, let me know what happens - thanks.

ssalaues · April 15, 2019, 9:58pm

So I tested it out with the rsize=4096,wsize=4096 mount options and definitely saw an improvement.

$ time ./rclone ls source:/nfs | head
   274432 myobjects000000
   274432 myobjects000001
   274432 myobjects000002

real    4m5.322s
user    0m6.247s
sys     0m38.749s

I also recompiled and tested using golang1.11 out of curiosity which increased the buffer to 8192 as a result of the issue you linked. Then remounted using rsize=8192,wsize=8192. The results were virtually the same:

time ./rclone-go1.11 ls source:/nfs | head
   274432 myobjects000000
   274432 myobjects000001
   274432 myobjects000002

real    4m12.914s
user    0m15.435s
sys     0m42.271s

I guess the take away is to match the NFS buffer to the one used by getdents64 otherwise some weird queuing starts to happen?

ncw · April 16, 2019, 8:47am

I think that is the conclusion. I don’t have much idea why though which is a little unsatisfactory!

system · July 15, 2019, 8:48am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.