Rclone bisync at the exact time the file is changed

What is the problem you are having with rclone?

rclone bisync runs in the background every 5 minutes between local (Linux) and Google Drive. One of the directories it syncs has a bunch of Markdown files which I'm regularly updating. Often I'm changing one file at the exact same moment that rclone bisync runs, and it fails with:

2024/04/20 23:18:12 INFO  : Validating listings for Path1 "Archive{YRXYK}:/" vs Path2 "/home/user/Archive/"
2024/04/20 23:18:12 ERROR : -                   Size not equal in listing. Path1: 1883, Path2: 1569 - Notes/rclone-debug.md
2024/04/20 23:18:12 ERROR : -                   Modtime not equal in listing. Path1: 2024-04-20 21:18:06.447 +0000 UTC, Path2: 2024-04-20 21:17:54.330635919 +0000 UTC - Notes/rclone-debug.md
2024/04/20 23:18:12 ERROR : Bisync critical error: path1 and path2 are out of sync, run --resync to recover
2024/04/20 23:18:12 ERROR : Bisync aborted. Must run --resync to recover.

I've tried a few things with no success:

  • Tried the exact command under --check-access in the docs and got the same error.
  • Read the concurrent modifications section in the docs and learned that with v1.66.0 there were some improvements done, but it doesn't solve this problem. I believe this problem falls under this sentence:

note that an error can still occur if a file happens to change at the exact moment it's being read/written by bisync

  • Compiled a custom version where I got rid of this check. The first run works and then, as it's now obvious to me, the second run finds a conflict in the file since both files have slightly different contents.

I tried to go through Github to see if there were relevant discussions about this, but I couldn't find them (maybe because I didn't know what to search).

So my question is: is there a way to workaround this and still have a valid workflow? I wouldn't mind doing syncs without checks every 5 minutes, and then doing a global check every couple hours.

And, if there is no workaround, are there any on-going or past discussions about this issue and how it could be hypothetically solved?

Run the command 'rclone version' and share the full output of the command.

rclone v1.66.0
- os/version: arch (64 bit)
- os/kernel: 6.8.5-arch1-1 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.22.1
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone bisync Archive: ~/Archive \
    --verbose \
    --metadata \
    --checksum \
    --ignore-listing-checksum \
    --check-access \
    --no-cleanup \
    --resilient \                 
    --drive-skip-gdocs

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[Archive]
type = drive
client_id = XXX
client_secret = XXX
scope = drive
token = XXX
team_drive = 

A log from the command that you were trying to run with the -vv flag

2024/04/20 23:17:51 NOTICE: bisync is IN BETA. Don't use in production!
2024/04/20 23:17:51 NOTICE: WARNING: Checksum compare was requested but at least one remote does not support checksums (or checksums are being ignored) and --ignore-listing-checksum is set.
			 Ignoring Checksums globally and falling back to --compare modtime,size for sync. (Use --compare size or --size-only to ignore modtime). Path1 (Google drive root ''): none, Path2 (Local file system at /home/user/Archive): none
2024/04/20 23:17:51 INFO  : Bisyncing with Comparison Settings: 
{
	"Modtime": true,
	"Size": true,
	"Checksum": false,
	"HashType1": 0,
	"HashType2": 0,
	"NoSlowHash": false,
	"SlowHashSyncOnly": false,
	"SlowHashDetected": true,
	"DownloadHash": false
}
2024/04/20 23:17:51 INFO  : Synching Path1 "Archive{YRXYK}:/" with Path2 "/home/user/Archive/"
2024/04/20 23:17:51 INFO  : Building Path1 and Path2 listings
2024/04/20 23:18:04 INFO  : Path1 checking for diffs
2024/04/20 23:18:04 INFO  : Path2 checking for diffs
2024/04/20 23:18:04 INFO  : - Path2             File changed: size (larger), time (newer)   - Notes/.obsidian/workspace.json
2024/04/20 23:18:04 INFO  : - Path2             File changed: size (larger), time (newer)   - Notes/rclone-debug.md
2024/04/20 23:18:04 INFO  : Path2:    2 changes:    0 new,    2 modified,    0 deleted
2024/04/20 23:18:04 INFO  : (Modified:    2 newer,    0 older,    2 larger,    0 smaller)
2024/04/20 23:18:04 INFO  : Checking access health
2024/04/20 23:18:04 INFO  : Found 1 matching "RCLONE_TEST" files on both paths
2024/04/20 23:18:04 INFO  : Applying changes
2024/04/20 23:18:04 INFO  : - Path2             Queue copy to Path1                         - Archive{YRXYK}:/Notes/.obsidian/workspace.json
2024/04/20 23:18:04 INFO  : - Path2             Queue copy to Path1                         - Archive{YRXYK}:/Notes/rclone-debug.md
2024/04/20 23:18:04 INFO  : - Path2             Do queued copies to                         - Path1
2024/04/20 23:18:12 INFO  : Notes/rclone-debug.md: Copied (replaced existing)
2024/04/20 23:18:12 INFO  : Notes/.obsidian/workspace.json: Copied (replaced existing)
2024/04/20 23:18:12 INFO  : Updating listings
2024/04/20 23:18:12 INFO  : Validating listings for Path1 "Archive{YRXYK}:/" vs Path2 "/home/user/Archive/"
2024/04/20 23:18:12 ERROR : -                   Size not equal in listing. Path1: 1883, Path2: 1569 - Notes/rclone-debug.md
2024/04/20 23:18:12 ERROR : -                   Modtime not equal in listing. Path1: 2024-04-20 21:18:06.447 +0000 UTC, Path2: 2024-04-20 21:17:54.330635919 +0000 UTC - Notes/rclone-debug.md
2024/04/20 23:18:12 ERROR : Bisync critical error: path1 and path2 are out of sync, run --resync to recover
2024/04/20 23:18:12 ERROR : Bisync aborted. Must run --resync to recover.

There might be some bisync specific workaround others can point you into but in general in such situations the most obvious and cleanest solution is to utilize local file system snapshots.

I would wrap bisync command in script taking snapshot, running bisync against it and then deleting snapshot. Of course its feasibility depends on your local setup capabilities.

Personally I always use snapshots when running operations (backup, sync) that assume not changing state of data.

Below some simple overview and explanation:

https://cyounkins.medium.com/correct-backups-require-filesystem-snapshots-23062e2e7a15

Nowadays snapshots are available on all main OS - be it Windows, Linux or macOS - so really there is no excuse not to do it right:)

You can use --check-sync=false to disable the final check that is failing. However, this shouldn't really be necessary after v1.66, so there is probably something else going on here, which I'd love to identify.

One odd thing I notice in your log is that the version of rclone-debug.md on the Path1 side (drive) ends up with a modtime that is roughly 12 seconds newer than the one on the Path2 side (local). Are you editing the local file in such a way that the modtime reverts to an older date?

I have one other theory which could be confirmed by seeing a DEBUG (-vv) log, if it's possible to share one. The one you provided here appears to be only at INFO (-v) level. Feel free to post a pastebin.com link if the log is too large to embed.

@kapitainsky - Thanks a lot for the information, I wasn't even aware some filesystems suspported this. Sadly my current setup seems to not support it, but if I don't find a better approach I may create a partition and give it a try.

I'm specially curious how/if it will work with bisync though, because I send data but I could also receive data, so I'm curious how that works with snapshots.

@nielash - Let me give DEBUG a try, but I've done some troubleshooting in the past and I noticed the following:

  • At the start of the sync: both Path1 and Path2 had a size of 626 bytes and the same modtime.
  • At the end of the sync (and when the error happened): Path1 (drive) had a size of 719 (and a newer modtime) whereas Path2 (local) had a size of 683 (and a slightly older modtime).

What I thought when I saw these numbers is that when the new listing is being built, a size and modtime are stored. However when the upload is actually done the file has already changed, so that's how Path1 (drive) ends up with a bigger size and newer modtime.

1 Like

In that scenario, what is supposed to happen is that both the Path1 and Path2 listings are updated to reflect the state of the file at the moment it was transferred (not at the moment the first snapshot was taken.) So it would be really helpful to see a DEBUG log to determine why that's not happening. (Or, if you have steps to reliably reproduce it, that's even better.)

This is valid point I have not thought about - what works perfectly for sync source would not for bisync. No use of filesystem snapshots in this case indeed.

The easiest way to reproduce that I've found is with Obsidian, which is a Markdown editor that saves the file as you edit it.

What I do is:

  1. Open Obsidian and a file which is being backed up to Google Drive.

  2. Open a terminal and run:

    rclone bisync Archive: ~/Archive -vv --metadata --checksum --ignore-listing-checksum --check-access --no-cleanup --resilient --drive-skip-gdocs --log-file /tmp/rclone-debug.log
    
  3. Immediatelly after running the command edit the file non-stop by just typing "asdf" until rclone finishes.

I reproduced it just now and uploaded the logs. It has some personal information so I extracted the parts relevant to the file I used to trigger the error (rclone-debug.md). If this is not enough, let me know and I may try to find a more tool-agnostic way to reproduce it.

rclone-debug.txt (31.0 KB)

Thanks! The error shown in the logs is a different one from your first post, though. In the logs, the error is Copy failed: can't copy - source file is being updated (size changed from 2725 to 2756) which is what I would expect for a source file that is rapidly changing while it's being copied. This is not unique to bisync -- the same thing would happen with sync. This error can be ignored with the --local-no-check-updated flag, although whether that's advisable depends on the circumstances -- I'd recommend reading about it first. It might be necessary for something like Obsidian if it's constantly hammering the file with changes. (Assuming you wouldn't rather wait to sync until after Obsidian is done.)

Does Obsidian have an "autosave" setting that could be tweaked to autosave to disk less frequently? If so, that might help.

If it's possible to reproduce the error from the first post (path1 and path2 are out of sync) and upload a log, I'd still be interested in tracking that one down, as that is unexpected. (Although upon rereading your post I see you mentioned compiling a custom version that changed things -- was that error from your custom version, or the official release?)

Thanks for the repro steps! I haven't used Obsidian before, but honestly it looks pretty cool, so maybe this is a good opportunity to try it out, and see if I can break bisync in the process :slightly_smiling_face:

This conversation got me thinking... I wonder if bisync could/should have a built-in option to use a filesystem snapshot when local is the source, and ignore it in the opposite direction when it's the dest. I'd need to think through the implications of that some more... but if that were possible, would it be of interest?

my backup script always creates a snapshot and uses that as the source for rclone, rsync, restic, 7zip, etc.

check out my wiki, can create a snapshot in just four lines of code.
https://github.com/rclone/rclone/wiki/How-to-enable-VSS-for-rclone

1 Like

Thanks a lot! I was expecting this behavior to affect sync as well, but somehow I had not come across this flag, I'll read the docs and see if it fits my use case.

I'll look, but since it also happens with other applications (e.g. Moneydance), maybe I'll look into that flag you shared, or some other options.

I suspect that it'll happen without --resilient, since maybe when I got that log I wasn't using that flag. It wasn't from my custom version though, my logs were a bit more chaotic than this message :sweat_smile:.

It may be interesting, but I wonder how it's going to work in practice. For example rclone runs when the OS starts, and then every 10 minutes. When the OS starts I may have some data to sync from Google Drive to local, so in this run it shouldn't ignore dest. However in the next runs, it may ignore it safely as I'm not really syncing anything from Google Drive.

Anyhow if you end up pushing something in this direction, let me know and I may give it a try :smile:.