Synchronizing data out of Egnyte

Hi,

I'm trying to synchronise ~1TB of data out of Egnyte to local disk. As suggested in Github issue #1068 I've tried using Webdav however it takes in excess of six hours. I suspect it is downloading all files as as far as I can tell there's no timestamp or checksum information available in their Webdav interface. Either that or there's some config setting I need to make it more efficient.

I've also tried accessing via SFTP, which does seem to be faster, but I hit "Commands quota exceeded" errors.

Any ideas how to efficiently sync that amount of data out of Egnyte?

hello and welcome to the forum,

  • the limiting factor will be the internet connection.
    make sure do a speed test and know the max download speed.
    with that and using --progress, you can compare that speed to rclone's speed.
    based on that, if needed, might try the following.

  • really should sftp, as it should support modtime/hash.

  • try to increase --transfers and --checkers

  • not sure if it will help, might try to tweak --multi-thread-streams

if that does not help, then when you posted there was a template of questions, none of which you answered.
please answer all the questions, help us to help you...

Are you trying to sync it multiple times, so keeping an off-site archive?

How many files is that 1TB? Webdav will have to do one request per file I think which adds up.

1TB of data over 6 hours is 46MByte/s which sounds quite respectable to me. That is over 500Mbit/s.

The first sync should download all the data - the second should be much quicker.

Are you trying to sync it multiple times, so keeping an off-site archive?

Yes, that's the plan.

How many files is that 1TB?

About 1,105,000 files in 83,700 directories.

The first sync should download all the data - the second should be much quicker.

The six hours was on a second sync. I suspect it is downloading all files as as far as I can tell there's no timestamp or checksum information available in Egnyte's Webdav interface. Isn't that usually the case with basic Webdav? Either that or there's some config setting I need to make it more efficient.

Thanks.

if you think that rclone re-copies the same files each time rclone sync is run, this can be tested.

do a rclone copy -vv of a single file over webdav.
run the command two times.
post the entire output.

sftp should support timestamps and checksums
do a rclone copy -vv of a single file over sftp
run the command two times.
post the entire output.

please post the config file, redact id/secret/token/password

if that does not help, then when you posted there was a template of questions, none of which you answered.
please answer all the questions, help us to help you...

Here you go. Thanks for the help so far! :slight_smile:

What is the problem you are having with rclone?

Attempting to synchronise files stored in Egnyte is quite slow. Egnyte offers both Webdav and SFTP interfaces.

I am trying to regularly sync ~1TB of data daily for the purposes of keeping an off-site archive. Archive is comprised of about 1,105,000 files in 83,700 directories.

When using Webdav, repeated sync attempts are quite slow, in the order of six hours or more.

When using SFTP, it does seem to be faster, but I hit "Commands quota exceeded" errors.

What is your rclone version (output from rclone version)

rclone v1.56.0
- os/version: debian 8.11 (64 bit)
- os/kernel: 4.4.190.x86_64.1 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.16.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Egnyte, using either their Webdav or SFTP interface.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

For Webdav:

rclone -P engytedav: /data/egnyte-mirror

For SFTP:

rclone -P egnytessh: /data/egnyte-mirror

The rclone config contents with secrets removed.

[egnytedav]
type = webdav
url = https://mydomain.egnyte.com/webdav/
vendor = other
user = username
pass = password

[egnytessh]
type = sftp
host = ftp-mydomain.egnyte.com
user = username$mydomain
pass = password
md5sum_command = none
sha1sum_command = none

A log from the command with the -vv flag

Don't have one at this point, but can collect if necessary.

thanks but at this point, really need to see a debug log.

in my last post, i suggested a simple quick test so see if rclone is re-copying the same file each time rclone is run.

can you do that and post all the output.

thanks but at this point, really need to see a debug log.

Getting there.

After the initial copy:

For webdav

Command:

rclone -vv --log-file=dav.log copy egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe out

Log:

2021/09/07 11:07:14 DEBUG : rclone: Version "v1.56.0" starting with parameters ["rclone" "-vv" "--log-file=dav.log" "copy" "egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe" "out"]
2021/09/07 11:07:14 DEBUG : Creating backend with remote "egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe"
2021/09/07 11:07:14 DEBUG : Using config file from "/path/to/.config/rclone/rclone.conf"
2021/09/07 11:07:14 DEBUG : found headers:
2021/09/07 11:07:14 DEBUG : fs cache: adding new entry for parent of "egnytedav:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe", "egnytedav:Shared/J_Drive/Software/Java"
2021/09/07 11:07:14 DEBUG : Creating backend with remote "out"
2021/09/07 11:07:14 DEBUG : fs cache: renaming cache item "out" to be canonical "/path/to/temp/out"
2021/09/07 11:07:15 DEBUG : jdk-8u74-windows-x64.exe: Sizes identical
2021/09/07 11:07:15 DEBUG : jdk-8u74-windows-x64.exe: Unchanged skipping
2021/09/07 11:07:15 INFO  :
Transferred:              0 / 0 Byte, -, 0 Byte/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.7s

2021/09/07 11:07:15 DEBUG : 4 go routines active

For SFTP

Command:

rclone -vv --log-file=sftp.log copy egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe out

Log:

2021/09/07 11:07:57 DEBUG : rclone: Version "v1.56.0" starting with parameters ["rclone" "-vv" "--log-file=sftp.log" "copy" "egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe" "out"]
2021/09/07 11:07:57 DEBUG : Creating backend with remote "egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe"
2021/09/07 11:07:57 DEBUG : Using config file from "/path/to/.config/rclone/rclone.conf"
2021/09/07 11:07:59 DEBUG : sftp://username$mydomain@ftp-mydomain.egnyte.com:22/Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe: New connection xxx.xx.xx.x:58170->34.94.51.191:22 to "SSH-2.0-Egnyte"
2021/09/07 11:07:59 DEBUG : sftp://username$mydomain@ftp-mydomain.egnyte.com:22/Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe: Using absolute root directory "/Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe"
2021/09/07 11:07:59 DEBUG : fs cache: adding new entry for parent of "egnytessh:Shared/J_Drive/Software/Java/jdk-8u74-windows-x64.exe", "egnytessh:Shared/J_Drive/Software/Java"
2021/09/07 11:07:59 DEBUG : Creating backend with remote "out"
2021/09/07 11:07:59 DEBUG : fs cache: renaming cache item "out" to be canonical "/path/to/temp/out"
2021/09/07 11:08:00 DEBUG : jdk-8u74-windows-x64.exe: Size and modification time the same (differ by 0s, within tolerance 1s)
2021/09/07 11:08:00 DEBUG : jdk-8u74-windows-x64.exe: Unchanged skipping
2021/09/07 11:08:00 INFO  :
Transferred:              0 / 0 Byte, -, 0 Byte/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         2.4s

2021/09/07 11:08:00 DEBUG : 11 go routines active

End notes

Interestingly, if I do a touch out/jdk-8u74-windows-x64.exe then repeat, Webdav does not detect a difference, which indicates Webdav is comparing purely on size.

SFTP did download the file again, so comparison is clearly comparing modification time as well. From that log run:

jdk-8u74-windows-x64.exe: Modification times differ by 49119h41m40.599236845s: 2016-01-30 20:38:50 +1100 AEDT, 2021-09-07 11:20:30.599236845 +1000 AEST
jdk-8u74-windows-x64.exe: Copied (replaced existing)

both logs show that rclone does not re-copy existing files.

your comments about webdav and sftp are correct and can be seen in the debug log.

  • wedav compares by size only - jdk-8u74-windows-x64.exe: Sizes identical
  • sftp compares by size and modtime - jdk-8u74-windows-x64.exe: Size and modification time the same

So been running several tests, and the following command line options seem to work well without tripping the "Commands quota exceeded" when using SFTP.

rclone --transfers=4 --checkers=3 --retries-sleep=10s egnytessh:

It does not appear to particularly care about the transfers number, but if I increase --checkers from three to four I'll hit the command quota pretty quickly.

With the above, it generally does the synchronisation in about 6h45m using 1,105,549 checks. (This was for around 1,105,000 files in 83,700 directories.)

As there is not that many files that get modified between runs, the time taken is bound by the rate it can do the checks. Is there any other parameters I could tweak that may improve this?

Thanks.

given the hard limits imposed by the provider, not sure there is a good tweak.
might try --fast-list, tho not sure it does anything on sftp.

the provider might have a backup tool.

Tried --fast-list but as you expected, it made no tangible difference.

There is a Hyper-V or Vmware virtual machine they offer to do storage synchronisation, but we are not really in a position to use it. They use to have a storage sync app that ran on a Netgear Readynas, which is what we use to use, but they are dropping support for it at the end of this month. Hence my seeking alternative solutions for creating an independent backup.

I've been working with their support people, but they seem reluctant to offer alternatives.

Unfortunately to do a sync you need to read all the directories to see if the files have changed - this takes a while...

--fast-list doesn't work unless the protocol supports a recursive directory listing mode which SFTP doesn't.

So that time 6h45m is basically the time to read 83,700 directories - that is 3.4 per second which doesn't sound unreasonable.

If you want it to run faster lowering the ping times between the servers would help (SFTP protocol is sensitive to that), but that is hard without moving the server.

It might be worth trying --check-first which does all the directory traversals before starting any transfers. This might enable you to raise --checkers which would cut the time down.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.