Directory traversal strategy for depth vs breadth?

When copying or syncing a large amount of files and directories, I was wondering if there was some way to control whether the transfers operated depth-first, or breadth-first, or simply what the expected traversal method would currently be?

For example, if I have the following tree:

foo/1
foo/2
foo/3
foo/bar/4
foo/bar/5
foo/bar/6
foo/buzz/7
foo/buzz/8
foo/buzz/9
foo/bar/fizz/10
foo/bar/fizz/11
foo/bar/fizz/12

In a depth-first sync I would expect files 4,5,6 and 10,11,12 to be transferred before exiting out of foo/bar and starting on foo/buzz.

In a breadth-first copy I would expect 4,5,6 and 7,8,9 to be transferred before 10,11,12

I hope I'm explaining this clearly. I apologize if this is covered elsewhere, I did search docs and the forum but was unsuccessful. Thank you for helping me learn more!

The answer is that the order isn't defined. By default rclone will run 8 (controlled by --checkers) directory traversals at once, so anything could happen! If you set --checkers 1 then you'll get a breadth-first copy (roughly) but even that can be disrupted by concurrency within rclone.

If you care about which order the files get transferred then investigate the --order-by flag and if you want perfect ordering don't forget --check-first which does the entire directory traversal before uploading any files.

Thanks Nick! I appreciate your explanation. Is it conceivable that --check-first --order-by name could provide a depth-first transfer? Naively I am assuming this could ensure sub-directories would be traversed before the next parent's sibling, etc.

Imagine all the full file names written out in a list. Then sort that list. That would be the order. So that isn't depth first.

It would be possible to make a depth first order (transfer the file with the largest numbers of slashes first) fairly easily.

Fancy doing some coding?

You're right, it's not depth-first if we define that as meaning transferring of the deepest most items first, but it seems at least defensibly "depth-first" as the traversal is deterministic and will stay confined to the parent until completion, right?

Thanks for the idea of contributing. My Go is a little amateurish but I'll have a noodle and see if I get to a PR. :slight_smile:

To implement this you want to add a new case here - say "depth" and add a new sort function

I'd sort by number of / first and if they are equal, the full paths (the output of a.Src.Remote()) which will keep the sort order deterministic. You can count number of / with strings.Count.

That should be enough to get your started. There are tests in pipe_test and docs in docs/content/docs.md but fundamentally it should be a 6 line patch to the code.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.