I have an rclone mount with a directory that contains ~3500 child directories. After clearing any caches (through HUP’ing, using vfs/expire and/or cache/forget), a directory listing can take upwards of 6 minutes. The logs show it iterating through the child directories, and rclone ostensibly returns no data to the waiting thread until the process completes. Some applications like “ls” handle this fine, while others lock up and become unresponsive, requiring the process to be killed.
Oftentimes the kernel logs will alert about the stalled threads or application that issued the call.
This can be mitigated somewhat by using large cache expiry timeouts, but this only causes the issue to happen less often while causing other issues of its own.
I’ve run numerous tests using “rclone ls” to verify that some combination of flags wouldn’t ameliorate the issue to no avail. The two flags that had the most effect were “–fast-list” (positive) and “–dir-chunk-size” with values less than 1000 (negative).
time ( rclone ls GD:/Crypt/qtven3b8rorak2le5h4htl55hg --verbose=2 )
2018/04/12 10:55:23 DEBUG : rclone: Version "v1.40-065-ge82452ceβ" starting with parameters ["/usr/bin/rclone" "--verbose=2" "ls" "GD:/Crypt/qtven3b8rorak2le5h4htl55hg" "--verbose=2"]
2018/04/12 10:55:23 INFO : Google drive root 'Crypt/qtven3b8rorak2le5h4htl55hg': Modify window is 1ms
2018/04/12 11:04:03 DEBUG : 18 go routines active
2018/04/12 11:04:03 DEBUG : rclone: Version "v1.40-065-ge82452ceβ" finishing with parameters ["/usr/bin/rclone" "--verbose=2" "ls" "GD:/Crypt/qtven3b8rorak2le5h4htl55hg" "--verbose=2"]
( rclone ls GD:/Crypt/qtven3b8rorak2le5h4htl55hg --verbose=2; ) 2.77s user 1.15s system 0% cpu 7:05.92 total
With all that said, I don’t know if the bug lies with rclone, fuse, or the applications that handle the long directory listing times poorly. I was curious if anyone else had run into this and found a solutiont.
What’s the mount and the commands you are trying to do when you pull the directory?
My biggest dir is only 1750 entries and it doesn’t work well if I do the first cache build by doing something like “ls -alR | wc -l” in there. If I let Plex scan the dir, it seems to step through in a nicer fashion as I think it does an ls at the top level and then goes through each folder.
I wonder if the problem is more exponential than linear if you cross a threshold.
I used rclone ls only to test what kind of effect various parameters had on the speed of recursively listing a directory, but did not result in anything useful. As you’ve further pointed out, most variation was probably due to outside variables and not relevant.
I can produce the results reliably and have a verbose log saved. This is the part containing what was asked:
Thanks. From that log (the directory listings inside the ReadDirAll and the >ReadDirAll it is very clear that this a problem with the cache backend recursing (or maybe looking only 1 more directory deep) on the directory read.
I had a look at the code, but I could see anything obviously wrong… I think it is probably best if you summarise this in a new issue on github and we can get @remus.bunduc to take a look at it.