I'm trying to synchronise ~1TB of data out of Egnyte to local disk. As suggested in Github issue #1068 I've tried using Webdav however it takes in excess of six hours. I suspect it is downloading all files as as far as I can tell there's no timestamp or checksum information available in their Webdav interface. Either that or there's some config setting I need to make it more efficient.
I've also tried accessing via SFTP, which does seem to be faster, but I hit "Commands quota exceeded" errors.
Any ideas how to efficiently sync that amount of data out of Egnyte?
the limiting factor will be the internet connection.
make sure do a speed test and know the max download speed.
with that and using --progress, you can compare that speed to rclone's speed.
based on that, if needed, might try the following.
really should sftp, as it should support modtime/hash.
try to increase --transfers and --checkers
not sure if it will help, might try to tweak --multi-thread-streams
if that does not help, then when you posted there was a template of questions, none of which you answered.
please answer all the questions, help us to help you...
Are you trying to sync it multiple times, so keeping an off-site archive?
Yes, that's the plan.
How many files is that 1TB?
About 1,105,000 files in 83,700 directories.
The first sync should download all the data - the second should be much quicker.
The six hours was on a second sync. I suspect it is downloading all files as as far as I can tell there's no timestamp or checksum information available in Egnyte's Webdav interface. Isn't that usually the case with basic Webdav? Either that or there's some config setting I need to make it more efficient.
if that does not help, then when you posted there was a template of questions, none of which you answered.
please answer all the questions, help us to help you...
Here you go. Thanks for the help so far!
What is the problem you are having with rclone?
Attempting to synchronise files stored in Egnyte is quite slow. Egnyte offers both Webdav and SFTP interfaces.
I am trying to regularly sync ~1TB of data daily for the purposes of keeping an off-site archive. Archive is comprised of about 1,105,000 files in 83,700 directories.
When using Webdav, repeated sync attempts are quite slow, in the order of six hours or more.
When using SFTP, it does seem to be faster, but I hit "Commands quota exceeded" errors.
What is your rclone version (output from rclone version)
thanks but at this point, really need to see a debug log.
Getting there.
After the initial copy:
For webdav
Command:
rclone -vv --log-file=dav.log copy egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe out
Log:
2021/09/07 11:07:14 DEBUG : rclone: Version "v1.56.0" starting with parameters ["rclone" "-vv" "--log-file=dav.log" "copy" "egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe" "out"]
2021/09/07 11:07:14 DEBUG : Creating backend with remote "egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe"
2021/09/07 11:07:14 DEBUG : Using config file from "/path/to/.config/rclone/rclone.conf"
2021/09/07 11:07:14 DEBUG : found headers:
2021/09/07 11:07:14 DEBUG : fs cache: adding new entry for parent of "egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe", "egnytedav:Shared/J_Drive/Software/Java"
2021/09/07 11:07:14 DEBUG : Creating backend with remote "out"
2021/09/07 11:07:14 DEBUG : fs cache: renaming cache item "out" to be canonical "/path/to/temp/out"
2021/09/07 11:07:15 DEBUG : jdk-8u74-windows-x64.exe: Sizes identical
2021/09/07 11:07:15 DEBUG : jdk-8u74-windows-x64.exe: Unchanged skipping
2021/09/07 11:07:15 INFO :
Transferred: 0 / 0 Byte, -, 0 Byte/s, ETA -
Checks: 1 / 1, 100%
Elapsed time: 0.7s
2021/09/07 11:07:15 DEBUG : 4 go routines active
For SFTP
Command:
rclone -vv --log-file=sftp.log copy egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe out
Log:
2021/09/07 11:07:57 DEBUG : rclone: Version "v1.56.0" starting with parameters ["rclone" "-vv" "--log-file=sftp.log" "copy" "egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe" "out"]
2021/09/07 11:07:57 DEBUG : Creating backend with remote "egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe"
2021/09/07 11:07:57 DEBUG : Using config file from "/path/to/.config/rclone/rclone.conf"
2021/09/07 11:07:59 DEBUG : sftp://username$mydomain@ftp-mydomain.egnyte.com:22/Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe: New connection xxx.xx.xx.x:58170->34.94.51.191:22 to "SSH-2.0-Egnyte"
2021/09/07 11:07:59 DEBUG : sftp://username$mydomain@ftp-mydomain.egnyte.com:22/Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe: Using absolute root directory "/Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe"
2021/09/07 11:07:59 DEBUG : fs cache: adding new entry for parent of "egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe", "egnytessh:Shared/J_Drive/Software/Java"
2021/09/07 11:07:59 DEBUG : Creating backend with remote "out"
2021/09/07 11:07:59 DEBUG : fs cache: renaming cache item "out" to be canonical "/path/to/temp/out"
2021/09/07 11:08:00 DEBUG : jdk-8u74-windows-x64.exe: Size and modification time the same (differ by 0s, within tolerance 1s)
2021/09/07 11:08:00 DEBUG : jdk-8u74-windows-x64.exe: Unchanged skipping
2021/09/07 11:08:00 INFO :
Transferred: 0 / 0 Byte, -, 0 Byte/s, ETA -
Checks: 1 / 1, 100%
Elapsed time: 2.4s
2021/09/07 11:08:00 DEBUG : 11 go routines active
End notes
Interestingly, if I do a touch out/jdk-8u74-windows-x64.exe then repeat, Webdav does not detect a difference, which indicates Webdav is comparing purely on size.
SFTP did download the file again, so comparison is clearly comparing modification time as well. From that log run:
jdk-8u74-windows-x64.exe: Modification times differ by 49119h41m40.599236845s: 2016-01-30 20:38:50 +1100 AEDT, 2021-09-07 11:20:30.599236845 +1000 AEST
jdk-8u74-windows-x64.exe: Copied (replaced existing)
So been running several tests, and the following command line options seem to work well without tripping the "Commands quota exceeded" when using SFTP.
It does not appear to particularly care about the transfers number, but if I increase --checkers from three to four I'll hit the command quota pretty quickly.
With the above, it generally does the synchronisation in about 6h45m using 1,105,549 checks. (This was for around 1,105,000 files in 83,700 directories.)
As there is not that many files that get modified between runs, the time taken is bound by the rate it can do the checks. Is there any other parameters I could tweak that may improve this?
Tried --fast-list but as you expected, it made no tangible difference.
There is a Hyper-V or Vmware virtual machine they offer to do storage synchronisation, but we are not really in a position to use it. They use to have a storage sync app that ran on a Netgear Readynas, which is what we use to use, but they are dropping support for it at the end of this month. Hence my seeking alternative solutions for creating an independent backup.
I've been working with their support people, but they seem reluctant to offer alternatives.
Unfortunately to do a sync you need to read all the directories to see if the files have changed - this takes a while...
--fast-list doesn't work unless the protocol supports a recursive directory listing mode which SFTP doesn't.
So that time 6h45m is basically the time to read 83,700 directories - that is 3.4 per second which doesn't sound unreasonable.
If you want it to run faster lowering the ping times between the servers would help (SFTP protocol is sensitive to that), but that is hard without moving the server.
It might be worth trying --check-first which does all the directory traversals before starting any transfers. This might enable you to raise --checkers which would cut the time down.