Syncrclone vs bisync (and rclonesync-v2)

I've been sherlocked. As of v1.58.0, rclone now has bisync. syncrclone works fundamentally differently as compared in syncrclone vs bisync (and rclonesync-v2). For the time being, I fully plan to continue development. To be 100% clear, I have no hard feelings about it.

But I wrote the below documentation for syncrclone if anyone is interested.

Again, this is a good-faith comparison. Please correct any errors and I will update the documentation




syncrclone vs bisync (and rclonesync-v2)

Preface

Let's be 100% clear and upfront. I AM BIASED. I wrote syncrclone to fill a need not met by bisync / rclonesync-v2.

Additionally, both are great tools and I am not disparaching the hard work of the various developers! The thing to remember is that bi-directional syncronization is fundamentally harder than mirroring because you need to keep the state. And therefore, it is also more opinionated.

rclone bisync is based on rclonesync-v2. So I will just say bisync even though some of this is experiance from the other

One final note: These may be wrong. I based it on experiance, reading, discussions, and inference from said activities. I do this in good faith but may be wrong. Please correct me!

Algorithm Differences

Before I go into details on the differences in features, there is a fundamental difference in algorithms. It is best described as follows.

  • bisync separately compares the current state to the past state to generate changes. It then propagates those changes and resolves conflicts.
  • syncrclone first compares the current state of each machine and then uses past state to resolve conflicts and deduce needed changes.

Both theoretically result in the same final process. But the latter only implicitly needs the prior state while the former requires it to work. As such, syncrclone does not have any need to notify it that this is the first sync and only has a singular code path.

It also explains some (but not all) of the feature differences below

Comparisons

Again, these are in good-faith but may be wrong. Please correct them as needed

Feature syncrclone bisync Comments
Mode of configuration Config file (path specified implicitly or explicitly) Command line flags Command line is cleaner and more consistent with rclone but there is a lot of configuration that needs to be kept (e.g. filters, etc) which makes the config file really useful. Makes the sync directories more like a repo (a la git)
Change Propagation Compare *current* state and use previous to resolve conflicts Propagate differences between current and previous to both sides, then compare *Theoretically* both should be identical but syncrclone is more robust to issues with knowing the previous state. It also removed the need for `--first-sync` type flags and other safety mechanisms. If they have never been synced before, you get the union of the two sides which is safer. No deletes. It also better matches how rclone currently does the comparison
First sync mode Implicit. Same code path Explicit. Must handle differently
Filters and affects Can *safely* change filters except for `--include-if-present` Must rerun with first-sync mode. Loses some conflict detection This difference is due to the algorithm and when filters get applied
Comparisons ModTime, size, and/or hash ModTime Reliance on ModTime *severely* limits which remotes can be used bisync. ModTimes can also be fragile when restoring from backups.
Previous state data Default: Stored inside each remote and named based on a unique name for the pair. Alternative: Can be stored on any *other* rclone remote Global storage on the machine itself in a cache-like dir Saved state on the machine means that if you sync two remotes (e.g. OneDrive to Google Drive), you *need* to use the same machine. Also can lead to issues with paths and duplicates. However, saved state on the remotes leaves artifacts. Syncrclone can be configured to use a different remote but it is more complex and not default
Rename/Move Tracking Optional. Settable with ModTime + size, hash, or size alone (though latter not advisable) None that I am aware of
Reduce re-hashing of files Optional. Can keep previous hashes. Or with new hasher remote. Hasher remote Hasher remote didn't exist when syncrclone was first made. Both should work fine, like saving the previous state, the hasher is in a cache-like dir
File backups Optional. Either in the remote or a different one None
Conflict tagging Yes. Optional ??? i.e. keep both but rename one with a suffix
Delete Fail Safe None (except backups) Yes. Can set a max-deletes has other protections including the default design which will not delete from mis-specified remotes
Second file listing Yes but experimental feature to not-need Yes Syncrclone's experimental feature may soon be default. Saves a lot of time on slow-to-list remotes
User Support Minimal Forums, Professional developers. Larger community I am a hobby developer and syncrclone is a side hobby project
Platforms Tested on macOS and Linux. No idea if this works on windows Presumably all platforms I never tested syncrclone on Windows. If it doesn't work, it very likely can be fixed to work.
Install Must install python3. More complex None. Built in
2 Likes

Definitely..! Thanks for the work into syncrclone! It made my day!

I have several WebDAV clients accessing one WebDAV server for my notes sync on NextCloud (rule: news overwrites always older). syncrclone exactly does the job and is fast.

rclone bisync runs into issues and is also slow. I really don't understand the bisync --resync procedure. A bisync should be able to start from two current file listing states and do it's job. If I need an initial sync I still can use rclone sync itself.

also rclone bisync had problems with file modification times. --no-update-modtime --modify-window 15000ms made the issue better but ultimatly it broke again when i used it on a second client.

I’m glad it worked out well! They are very different tools for sure with some real pros and cons

Pro-tip: turn on the avoid_relist (or something like that, I’m going from memory). I need to make it the default but it makes things much faster for many remotes with only a few edge-case downsides.

1 Like

I think syncrclone algo is superior since bisync seems to only operate for a single-user and also not robust against some other changes. Also using mod-time has checksum is working better in my use-case.

I digged I bit into bisync code but I guess I'd really would have spent time to integrate syncrclone into golang since I'm still a bit of a golang newbie. bisync: implementation #5164 · rclone/rclone@6210e22 · GitHub

Also for cloud-storage SHA1 should be an option. with MD5 you would run into a conflict with nowdays terabytes of data.

Fascinating, thanks for sharing!