Ls slow on large directories (500~ files) (Hubic)

twingo · October 31, 2017, 3:32pm

Hi

I’ve just started using Hubic and found that whilst using “rclone mount” or “rclone ls” on the hubic service that directory listings are awfully slow and getting slower the more files i have in them. As an example. I have a folder with 3 items inside it (SmallFolder) and a folder with 500~ items in it (LargeFolder). Lots of applications timeout and fail because of this. I do get 50Mbit/sec from hubic whenever i pull a file, and as the server im using is based in OVH’s network, shouldnt be the problem. Its just directory browsing is awfully slow…the web interface at hubic is snappy.

I’ve tried ignoring mod/change times when mounting but it makes no difference, is there anything else i can try?

time rclone ls crypto2:SmallFolder
real 0m1.672s
user 0m0.152s
sys 0m0.004s

time rclone ls crypto2:LargeFolder
real 2m31.751s
user 0m0.264s
sys 0m0.052s

Current mount line - rclone mount --buffer-size 512M --dir-cache-time 60m --read-only --allow-other crypto2:

ncw · November 1, 2017, 9:20am

Let's try to find out if this is a hubic problem or an rclone problem.

Can you run

And post the results? Make sure you remove any Auth tokens from the output. This should show whether it just takes Hubic a really long time to process the request or whether it is rclone doing something.

twingo · November 1, 2017, 5:44pm

I did what you asked and uploaded the debug file to - http://212.47.232.232/hubicDebug.txt

Looking through the file, it appears that the LS command is checking the information on each file which is taking approx 180ms (latency between server/hubic) and with 500 files…0.18x500 = 90seconds, which is what i seen as the response in the debug information.

You know far more than me, but is it necessary to query each file like that and if it is, im guessing the only solution would be some kind of multi-threaded lookup.

ncw · November 1, 2017, 6:27pm

Thanks for that - very useful.

I see what is going on..,

The majority of these files (388) are bigger than 5GB which means they've been uploaded as segmented files. Unfortunately swift returns the length of large files as 0 in the directory listings, so rclone does the extra HEAD query to read the true length on any files of 0 length.

Alas there isn't a good work-around for that. However rclone already works in a multithreaded way - try increasing --checkers to see if you can get the listing to run faster.

twingo · November 1, 2017, 8:52pm

I tried using 16/64/128/256 checkers it has made no difference unfortunately, perhaps a solution is just to store up to a maximum amount of files in each directory, right now i have approx 500, but could be 2000 by the end of the year and that would mean a simple "ls" taking 6-7mins to complete.

Thanks for your help and advice, much appreciated

ncw · November 2, 2017, 7:52am

Hmm, after checking the code I see that it is one checker per directory so yes you are correct.

I could parallelize the reading of the size in the directory listing routine with a bit of effort. That is probablly the best solution. I made an issue about this to remind me.

Another idea would be to fetch the Size when it is read, however rclone expects that not to be an expensive operation so that will cause ls to slow down in the same way.

That would work too.