Best options to copy hundreds thousands files using webdav

Frederic_Bastok · March 30, 2021, 3:29pm

Hi

I'm trying to optimize the copy of hundreds thousands of small files using the webdav protocol. Source is the webdav server, destination is a local folder. Bandwidth doesn't seem to be a limited factor.
The server seems to be IO limited.
I've tried different options, using --check-first then --checkers=50 but with limited success.
Seems that parsing the source folder to check which files must be copied is so slow that the transfers can't be done during the night. I can check around 20 gb in 60 min, so it will take 30h to check the 600 go, not even talking about transferring the files.
Is there any optimization I could use to reduce this checking time?
Thanks
Fred.

ncw · March 30, 2021, 5:18pm

Which server is the webdav server?

You can use more --checkers this will scan more directories at once, though I see you've tried that. If you think the server is IO limited then using less checkers might be a good idea (the default is 8) and --check-first is a good idea here too.

Are you copying these files just once or repeatedly?

Frederic_Bastok · March 30, 2021, 7:22pm

We use IIS with IT hit WebDAV server (https://www.webdavsystem.com).
I will try wit( less checkers. I’ve tried check-first, didn’t change much
The copy occurs every night. Transferring should be fast as not many files changed each day but parsing the files is the real issue.

ncw · March 31, 2021, 8:20am

It does sound like you are IO limited. Has the server got HDD instead of SSD? Scanning directories on HDD is often quite time consuming.

Frederic_Bastok · March 31, 2021, 11:58am

Should have SSD but I'm wondering if it works correctly ...

ncw · March 31, 2021, 7:16pm

It might be interesting if you time rclone lsf -R remote: and rclone lsl remote: to see how long they take. The first does a quick scan of each filename, the second reads the size and date which can take longer. They should both be relatively quick...

Frederic_Bastok · April 2, 2021, 6:31am

I stopped rclone lsf -R remote: after 2h running, far from finish. I would say it's slow ...

ncw · April 3, 2021, 7:36pm

Sounds like your webdav server is very slow if it takes 2h just to list the directories

Experiment with --checkers to find the value which is fastest for your server. You could try listing a subdirectory with a bit less stuff in it to do the experiment.

Frederic_Bastok · April 16, 2021, 10:10am

@ncw my understanding is that rclone will first parse all files to check which ones shoud be transferred and then will do the transfers.
Is there any option to check 1 file, transfer it, check the next one, transfer it and so on ?
That would make it slowier I think but should help me.
Thanks

ncw · April 17, 2021, 3:17pm

Rclone does the checking and transferring in parallel. --checkers sets the number of directories scanned in parallel, so you could try setting --checkers 1 to slow that part down.

system · June 17, 2021, 11:17am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.