"--no-traverse" does not work with "sync", possible improvement?

Hello everyone,

I’m copying a large local directory tree (with a few million files) to an Encrypted Google Drive remote.

Yesterday, my rclone “sync” command aborted with this error:

fatal error: runtime: out of memory

Coincidentally or not, that happened right after I upgraded to v1.35, v1.34 was “copy’ing” (not “sync’ing”) fine until I interrupted it for the upgrade. I’ve already tested with the latest v1.35 beta (and the issue is exactly the same), I’m now testing with v1.34 to see whether this was some kind of regression or not (it takes a few hours to consume all memory and abort).

Anyway, I told myself: easy fix, just include the --no-traverse" flag… no traversing so no reading of the entire remote directory’s files metadata to RAM, so no more “out of memory”. Did it, but no can do: rclone just keeps reading everything from the remote (as demonstrated by a LOT of “Google drive root ‘REDACTED’: Reading ‘REDACTED’” and “Google drive root ‘REDACTED’: Finished reading ‘REDACTED’”, and by the rclone process memory usage climbing up to ~33GB(!) and then aborting with that same “out of error” message in the log as above.

The exact command I’m using is:
rclone --no-traverse --bwlimit 4700K --low-level-retries=100 --checkers=12 --transfers=12 -v --log-file=./REDACTED.log sync /REDACTED egd:REDACTED

The local /REDACTED directory has ~8 million files totalling ~3.5 TBytes, of which ~6.2 million files totalling almost ~3.5 TBytes have already been copied (the ~1.8 million files still left to copy are mostly very small ones).

Can someone help? Am I missing something?

Thanks in advance,

Durval.

--no-traverse doesn’t work with sync. You should have seen a warning about this at the start of the log.

sync needs to traverse the whole filesystem to work out which files it needs to delete.

It would be interesting if you run

rclone memtest /REDACTED
rclone memtest egd:REDACTED

With both rclone 1.34 and the latest beta and post the results - that will give an accurate idea of whether there has been a regression in the memory usage.

What you really need is https://github.com/ncw/rclone/issues/517

Hi Nick,

Many thanks for your prompt response. More below:

So, there really was something I was missing… :-/ But:

  1. the docs (ie, “man rclone”) isn’t explicit about it (I just read it again, it only says “when using the copy or move commands”, ie doesn’t mention the sync command as one where it is supported). I think an explicit warning (a single phrase would suffice) would be good so an unwary user doesn’t fall in the same trap (or at least can’t complain when he does :wink: ).

  2. I just checked (again, as I had read it thoroughly before) and the log file has no warning regarding this; for the record, the first lines of said file say only:

    2017/01/19 12:45:44 Starting bandwidth limiter at 4.590MBytes/sng with parameters [“rclone” “–bwlimit” “4700K” “–low-level-retries=100” “–checkers=12” “–transfers=12” “-v” “–log-file=./REDACTED.log” “sync” “/REDACTED” “egd:REDACTED”]
    2017/01/19 12:45:45 Encrypted Google drive root ‘egt6ipr9ofpjnfac0587sqrd8g’: Modify window is 1ms
    2017/01/19 12:45:45 Google drive root ‘egt6ipr9ofpjnfac0587sqrd8g’: Reading “”
    2017/01/19 12:45:45 Google drive root ‘egt6ipr9ofpjnfac0587sqrd8g’: Finished reading “”
    And then it’s just messages similar to these all the way down. (EDIT: @ncw is right, as he states below the message was there and I just missed it. I’m noting this here so as not to lead anyone astray).

  3. Wouldn’t it be more sensible to abort the command (with a suitable message to stderr, not just the log file) if an incompatible option is specified? It would sure have saved some pain here…

Do you want me to open an issue for 2), and perhaps 1) and 3), above?

Thanks for setting me straight. I was thinking that, with “–no-traverse”, “rclone sync” would download each directory’s list of files as it entered it in order to do the deletions, and then discard that list when moving on to the next dir (wouldn’t that make more sense?) Should I also open an “improvement” issue for that? (EDIT: I just found out that this is what https://github.com/ncw/rclone/issues/517 does… I would do that as the default in case “rclone sync” is called with “–no-traverse” (which is what IMHO would make more sense), with perhaps an option “–all-dirs-at-once” to change it to a full remote listing at the beginning).

Will do, and post the results here when I’m finished.

Cheers,

Durval.

Yes the the docs could be clearer, certainly!

I can see the message when I try it. You only see the message with -v - maybe it should be a higher priority message?

I agree it is difficult to see because it is prefixed with the remote name - maybe I should remove that.

$ rclone sync -v --no-traverse /tmp/z /tmp/zz
2017/01/21 13:01:03 rclone: Version "v1.35-DEV" starting with parameters ["rclone" "sync" "-v" "--no-traverse" "/tmp/z" "/tmp/zz"]
2017/01/21 13:01:03 Local file system at /tmp/zz: Modify window is 1ns
2017/01/21 13:01:03 Local file system at /tmp/zz: Ignoring --no-traverse with sync
2017/01/21 13:01:03 Local file system at /tmp/zz: Waiting for checks to finish

My thinking was that the sync will still complete correctly, so it should only be a warning.

There are a number of other situations like this. where rclone adjusts the flags to be compatible and prints warnings: https://github.com/ncw/rclone/blob/master/fs/sync.go#L67

I think I should definitely raise the priority of the message to an Error though.

I think we should

  • raise the warning for --no-traverse to be an ErrorLog (like the other messages in that bit of code)
  • patch the docs to explicitly state sync is not compatible with --no-traverse
  • remove the remote name from the log message (it isn’t relevant to any particular remote) which will make it stand out more

What do you think?

If happy can you make an issue with those in please?

Thanks

Nick

Hello Nick,

I think your approach is perfect.

Sure thing, here it is: https://github.com/ncw/rclone/issues/1059

Thanks again,

Durval.

1 Like

Here’s the output for /REDACTED:

rclone -V
     rclone v1.35-36-ga4bf22eβ
rclone memtest /REDACTED
    2017/01/21 20:09:43 7390002 objects took 4464056880 bytes, 604.1 bytes/object
    2017/01/21 20:09:43 System memory changed from 197019896 to 5209983552 bytes a change of 5012963656 bytes

rclone-v1.34 -V
    rclone v1.34
rclone-v1.34 memtest /REDACTED
    2017/01/21 20:09:43 7390002 objects took 4464056128 bytes, 604.1 bytes/object
    2017/01/21 20:09:43 System memory changed from 195938552 to 5209983328 bytes a change of 5014044776 bytes

So, for the local file system, there’s no regression re: memory use; the difference in memory usage from one to the other is negligible.

Running now for the encrypted remote.

Cheers,

Durval.

Hello @ncw,

And here they are:

rclone memtest egd:/REDACTED
    2017/01/22 17:27:24 6170814 objects took 5274399920 bytes, 854.7 bytes/object
    2017/01/22 17:27:24 System memory changed from 206997752 to 9440345800 bytes a change of 9233348048 bytes

rclone-v1.34 memtest egd:/REDACTED
     2017/01/22 16:58:30 6170814 objects took 5274908264 bytes, 854.8 bytes/object
    2017/01/22 16:58:30 System memory changed from 209264336 to 13842201440 bytes a change of 13632937104 bytes

So, it seems that the number of bytes/object is pretty much the same between the to versions… so it seems my feeling of a regression (ie, rclone v1.35 sucking up more memory than v1.34) had no fundament. Sorry for the false alarm.

Cheers,

Durval.

I’m at the stage when I’m looking for a volunteers to try the new sync algorithm. This is low memory and efficient. It passes all the unit tests (there are lots) and my own tests, but I don’t guarantee it is bug free yet.

Anyone want to give it a go?

I posted a beta to #517 - please try that and comment on the issue - thanks!