Rclone sync bug: files are being erroneously erased/moved on the remote, causing subsequent rclone syncs to copy them over and over again (Was: Repeated "rclone sync" commands from a read-only local FS copying the same files over and over again?!)

Hello Everyone,

I’m seeing some weird behavior from ‘rclone sync’ here: I’m syncing a large (~5TB in ~8M files), currently quiescent (mounted RO) file system from local to a Encrypted GDrive remote.

After the first copy ended, I just ran the same command again and it copied 70 more files. I thought “what the hell?” (the source file system was mounted read-only, so of course there were no changes), but just ran it a third time… and it copied the same files over again.

Searching the log by the copied files’ names, I see these files are being copied at some point in the log, and then erased (actually moved, as I’m using the --backup-dir option) further towards the end. So this is why a subsequent “rclone sync” copies then over… but why had the previous “rclone sync” erased them? Has anyone seen anything like that?

This behavior is, as far as I can see, repeatable; so far I’ve been using rclone v1.36 amd64 on a Devuan Jessie (systemd-less Debian fork), trying again now with the latest beta (v1.36-98-g7ee3cfd7β). The command line I’m running is:

rclone --low-level-retries 100 --checkers 12 --transfers 12 -v -v --log-file ~/20170518Z111826_rclone_sync_REDACTED_egd --backup-dir egd:REDACTED_-_backup-dir_-_20170518Z111826 sync /REDACTED egd:REDACTED

Thanks in advance for any tips & hints.

Cheers,

Durval.

Delete those files from ACD and any duplicates and try again. Use the rclone dedup also. I found for some odd reason for certain files if would create duplicates and then each sync it would try to replace those with another dup and continue the cycle. I’d you delete those files that are being transferred the next sync will likely be okay. Gdrive oddity.

Hi Calisto,

[quote=“calisro, post:2, topic:2252, full:true”]
Delete those files from ACD and any duplicates and try again. Use the rclone dedup also.[/quote]

ACD? duplicates? dedup? I’m sorry, but I do not follow you here…

As far as I can tell, these files are not duplicates of anything. I have MD5s for every file in that filesystem, so it’s simple to find out for sure whether those files have any duplicates or not. I will check and get back to you.

Delete those files where? As they are already deleted (ie, moved to the backup-dir) on the destination remote, I think you mean deleting them on the local source filesystem… Of course that this would “solve” the problem, but then it would really be just a workaround, no? And these files are in principle important (that’s why I’m backing them up) and I’d rather not delete them…

Cheers,

Durval.

Noo… :slight_smile: What I meant was this:

I found instances like yours where the same file would be uploaded every time. What I found was it was the same X number of files that this would occur with. I found that by deleting all the files on the google side first and then redoing that sync would fix the problem. In your case you have that backup directory. I would try deleting them from that backup directory and then retrying the sync and seeing if that fixes it. It may also be that my issue was different than yours but i’ve seen others have this problem so I figured i’d mention.

Hello Calisro,

I think I follow you now. Thanks for the clarification.

In my case I have a number of backup dirs as I’ve re-run the “rclone sync” (with different backup dirs) a few times already; as soon as the currently running “rclone sync” (using the latest beta) finishes, I will download them all (for preservation purposes) and then purge them on the remote and re-run the original “rclone sync”.

@ncw, does that make any sense to you? Would that be a bug on rclone or on gdrive (or both)?

Cheers,

Durval.

Hello everyone, and especially @ncw :

The latest “rclone sync” (with rclone’s latest beta, v1.36-98-g7ee3cfd7β) has finished here, and this time there were 70 files copied from that totally quiescent (as in “read-only mounted”) file system :-/

So, whatever the problem is, it persists with the latest rclone.

For more detail, here are all the “-v -v” log lines that mention one of those 70 files:

2017/05/18 12:36:18 INFO  : REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED.info: Copied (new)
2017/05/19 10:12:27 INFO  : REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED.info: Moved (server side)
2017/05/19 10:12:27 INFO  : REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED/REDACTED.info: Moved into backup dir

Looking around each of those 3 lines, I see nothing out of ordinary (just “Reading”, “Finished reading”, “Size and modification time the same” and similar messages around the first line above, and the same messages for other files in the same situation, in the case of the last 2 lines above).

EDIT: now I think I understand why these files are being copied over and over; it’s not because a “rclone sync” erroneously decides the file has changed in the local filesystem and needs to be copied, it’s because a previous “rclone sync” erroneously decided to remove that file from the remote (by moving it server-side to the backup-dir, see the last two lines above). So this unduly removal (and not the copying per se) is the issue that needs to be addressed.

Thanks,

Durval.

I thought of this post when I created my issue. Could the md5s be making them copy over and over like in my situation?

https://github.com/ncw/rclone/issues/1430

Hi @Calisro,

Thanks for the heads-up.

I don’t think so, as my “rclone sync” command (see above) does not include the “-c, --checksum” option, which I understand would be needed to make “rclone sync” take any decisions on copying a file over based on checksum instead of basing it only on mod-time and size (which I’m pretty sure are the default).

In fact, it now seems to me that the reason these files are being copied over and over in newer “rclone syncs” is because a previous “rclone sync” command ended up moving it on the server to the backup dir, so in fact when the new “rclone sync” runs, the file actually isn’t there and needs to be copied (see the “EDIT” paragraph on my post above).

In other words: this issue is not due to a subsequent “rclone sync” unduly detecting an old file as modified and copying it over; the issue is due to a previous “rclone sync” unduly removing the file on the remote (by moving it to the backup-dir). I think this is definitely a bug and therefore I’m filling a github issue right away so it can be addressed by @ncw when he thinks it is appropriate. I’m also editing the subject of this topic accordingly.

Cheers,

Durval.

1 Like

I have replied on the issue to this thread. Apologies for the delay - things been a bit hectic!

Thanks @ncw, and no apologies needed! Your answer was quite prompt specially in light of your going through that whole Amazon FUBAR (shame on them). I’m replying further over there on the github issue too.

I got the same problem here. On every execution RClone deletes random files and copies some others.

I went back to version 1.35 and it worked fine, so it seems to affect only version 1.36 .

I got the same problem here. On every execution RClone deletes random files and copies some others.

I'm seeing this as well with the 1.36 release and with rclone v1.36-222-gf3c7e1a9β when attempting to sync a non-encrypted local drive to an encrypted directory on hubiC. Here's a log snippet:


2017/06/28 08:01:59 INFO : Squeezebox Music/NotDTS/The Space Negros/Dig Archaeology 1980-1990/16 - In-A-Gadda-Da-Vida.flac: Copied (server side copy)
2017/06/28 08:02:01 INFO : Squeezebox Music/NotDTS/The Space Negros/Dig Archaeology 1980-1990/16 - In-A-Gadda-Da-Vida.flac: Deleted
2017/06/28 08:02:01 INFO : Squeezebox Music/NotDTS/The Space Negros/Dig Archaeology 1980-1990/16 - In-A-Gadda-Da-Vida.flac: Moved into backup dir

The file in question definitely exists on the local source drive.

The command:

/home/administrator/rclone_beta/rclone sync /mnt/WD6TBNAS01/ hubicrypt:WD6TBNAS01/  --checksum --log-file=/mnt/WD8TBNAS03/Logs/Aspire-R1600-2017-06-28_07-35-37.txt --log-level INFO --delete-after --backup-dir=hubicrypt:oldversions --suffix=2017-06-28_07:35:37

I’ll just note that it is the --backup-dir option which is producing these logs. Namely rclone has detected the file has changed and needs to move the old one into the backup dir.

Do you have the bit of the log which shows rclone deciding to upload this file?

Can you try without --delete-after?

@atrocity can you also check if there are any other ERROR logs? I have found a code path where this could happen, but it would need something to go wrong, eg the hash calculation.

I’ll just note that it is the --backup-dir option which is producing these logs. Namely rclone has detected the file has changed and needs to move the old one into the backup dir.

The thing is, the local files in question last changed in 2009. They were uploaded to hubiC during my original backup, but this subsequent run is removing them from the main directory and placing them into the backup. That is, it’s not putting an old version in the backup directory while leaving what it thinks is a newer one in the main directory, it’s leaving nothing in the main directory. My workaround has been to do the backup again with rclone v1.35.

I have placed a copy of the complete log for this run at http://www.wywh.com/rclonelogs/Aspire-R1600-2017-06-28_07-35-37.txt. You can see other files getting the same treatment, they were all unchanged on the source end. I’ll try the job again using the 1.36 beta without –delete-after. Thank you!

Yes I think that is the symptom of this problem which you can see clearly in your log

  • first run removes files to backupdir it shouldn’t
  • subsequent run restores those files

You can see both those things happening in your log.

So the question is why rclone thought the files were missing.

Can you do me a run with -vv please - that would be really helpful.

Also a separate run with -vv and --fast-list would be very interesting. This will use more memory.

At the risk of creating clutter, I’ll do this over multiple posts because I’m easily confused. :slight_smile: First here’s a log using the beta with the –dry-run parameter: http://www.wywh.com/rclonelogs/Aspire-R1600-2017-06-29_08-36-04.txt.

The Ray Charles files it says need to be uploaded are legitimately new and should be backed up.

All the Ian Dury files already exist at the backup destination. On the source end all have dates from 2009 to 2011 and do not need to be re-uploaded.

Currently executing with -vv and without –dry-run. Will probably take a bit to complete.

Live copy job is still running but I’ve realized that it’s going to list 100,000+ files. While waiting, I’ve submitted a –dry-run job limited to just the Ian Dury source directory. Curiously, that particular run does not seem to believe that anything needs to be transferred: http://www.wywh.com/rclonelogs/Aspire-R1600-2017-06-29_09-31-13.txt.

@atrocity Good work on the logs!

Which OS are you using BTW and where are the files stored - on a locally attached disk or over the network?

Which OS are you using BTW and where are the files stored - on a locally attached disk or over the network?

Operating system:

Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty

Source files are on an NAS mounted via CIFS.

The live backup has completed and has created a 37 megabyte log…which no longer shows the Ian Dury files needing to be transferred. I can upload the log if you want, but at that size and without showing the problem, I’ll wait until you say you want to see it. :slight_smile: I realize that I mistakenly used both -vv and –fast-list when you asked me to first try just -vv.