I have a specific use-case in which I setup RClone to take an NFS server mount at lets say /nfs
and sync to our local network S3 compatible endpoint. Typically this use case has been incredibly fast and efficient on resources (kudos on the awesome program). However over the weekend I encountered a specific scenario where I have an nfs mount with no nested folders and 500,000 files which causes RClone to seemingly hang for almost 4 hours before the sync actually starts.
Originally I was using v1.45 but tested with 1.46 and latest beta build v1.46.0-112-g1c301f9f-beta
to see if the newer versions fixed the issue with no luck.
The original command itself is:
rclone sync \
source:/nfs \
destination:nfs-test \
--s3-region=nfs-sync \
--checkers=64 \
--s3-upload-concurrency=32 \
--transfers=32 \
-v
rclone.conf:
[source]
copy_links = false
nounc = true
type = local
[destination]
acl = bucket-owner-full-control
bucket = nfs-test
endpoint = http://local-endpoint
env_auth = true
provider = Other
region = nfs-sync
type = s3
Steps to Reproduce issue
I believe I’ve narrowed it down to a listing issue by simplifying the command. Doing this I was able to reproduce the same behavior in my local environment.
- Creating an nfs export
- I used these options:
/mnt/nfs-test/ *(rw,sync,fsid=1,no_subtree_check)
- I used these options:
- Mounting the NFS:
mount -t nfs 127.0.0.1:/mnt/nfs-test /nfs
- Created ~128GB in 500,000 dummy files via
dd if=/dev/urandom bs=274432 count=500000 | split -a 6 -b 268k - /mnt/nfs-test/file.
- Run
rclone ls
:$ ./rclone ls source:/nfs -vvv 2019/04/08 20:19:38 DEBUG : rclone: Version "v1.46.0-112-g1c301f9f-beta" starting with parameters ["./rclone" "ls" "source:/nfs" "-vvv"] 2019/04/08 20:19:38 DEBUG : Using config file from "/home/ubuntu/.config/rclone/rclone.conf"
It has the appearance that the process is hung but will eventually start showing an output (maybe hours later). The issue seems to be specifically related to how RClone is reading the directory but I’m not super familiar with the code base. I could see on nfsstat -s
that during this long pause there are heavy amounts of readdir
calls going on.
I don’t see the issue with running it directly on the /mnt/nfs-test folder (so not through NFS) but in my case, I do not have access to the actual NFS server in question and only access to the mount itself which is what led me down this path.
I couldn’t immediately find any flags that seemed to help (tried --fast-list
but don’t think it works on local). Open to any suggestions or even point me in the right direction for a bugfix/improvement contribution.