'sync' and 'copy' overwriting existing files, seemingly at random?

rclone version 1.4
windows 10 latest updates
cloud drive: google drive

Command:
C:\Users\jammi\Desktop\rclone\rclone-v1.40-windows-386\rclone.exe --config C:/Users/jammi/Desktop/rclone/rclone-v1.40-windows-386/rclone.conf sync --verbose --transfers 2 --checkers 8 --contimeout 60s --timeout 300s --retries 3 --low-level-retries 10 --stats 5s \freenas\ftp\Archive\ drive:Media/Archive

The size of the source directory is about 2.5TB, I had it running for a week or so (my upload is only 10MBps) and eventually cancelled it with “ctrl-c” to do a restart, thinking I would just restart it using the same command.

The source folder was empty when I started, I was creating a copy on google drive for the first time.

Now for some reason, it is re-uploading quite a few of the files and it seems to do it at random.

For example, if I run the above command it will skip say the first 20 files and then find a file that already exists but it thinks does not exist for some reason and start uploading it. Lets call it file #21. If I cancel this with ctrl-c and re-run the exact same command it might stop on file #15 and start uploading that one. If I cancel it again and re-run the command again, it might make it all the way to file #30 before it finds a file that it thinks needs to be syncd.

The problem here is that, I’ve uploaded a few thousand files now and even after letting it run for a couple of days like this, It feels almost like I’ve started over from scratch because it only skips 10-15 files at a time and then starts uploading another one that already exists so I’m still nowhere near I was when I cancelled it the first time.

I’m fairly new to rclone so this is likely something I’m doing wrong, but because of the sporadic behavior it makes me think it make me a bug or a problem with the google drive implementation.

Thanks!

That is strange!

Have you seen any warnings about duplicates in the log? Running rclone dedupe may fix things.

What kind of google account are you using? I’ve seen strange behaviour with directory listings with Team drives before.

After you’ve run dedupe, if you still have the problem, can you run rclone ls drive:Media/Archive a few times, saving the output to a file each time and see if there are files missing from the listing?

Hey, thanks for the response!

I am not seeing any duplicates, the only error I’m seeing which also seems random (and as far as I can tell, it’s wrong) is:
2018/04/05 10:54:20 ERROR : Archive10/Photo0412.jpg: Failed to copy: can’t copy - source file is being updated

I did try running dedupe and did ls to a few files - all of them are identical. Verified by both character count and file size so that part seems to be working as expected.

I’m still seeing this, if I ctrl-c and push my up arrow and enter it will start uploading different files than it was uploading when I cancelled it.

Let me ask you this - what happens to partial uploads? Does it take some time before those are trashed by google drive? I notice they don’t appear in drive until they are 100% complete so I’m not really sure where they are while being uploading and if rclone is able to see those and skipping them?

Nick just a quick update - after a little bit of testing I have a suspicion that this applies only to large files… sync’d a folder of 1500 files between 500kb and 1500kb and it worked flawlessly, if I cancel and rerun the command over and over, it picks up where it left off and when it is completed if I rerun the command it transfers 0 files every time.

“large” files in this case are as big as 6GB however I have noticed the issue on files as small at 800MB - it’s a little more difficult to test on a large scale because of my 10Mbps upload

Partial uploads never appear in drive.

Hmm, I wonder if the big files are taking some time to appear in the directory listing - that would explain your problems.

I did see something like that when testing with a google team drive - the listings took ages to update properly.

rclone isn’t perfectly deterministic as it runs in parallel through lots of directories so that might be expected. It shouldn’t upload files that got uploaded properly though.

Also note that drive (and all cloud storage systems) are “eventually consistent” that means that sometimes they are not consistent, for instance recent files can be missing from the directory listings.

I should have mentioned about the team drive, sorry. I’m not sure if this is what you consider a team drive - I do have a “team drive” section in my account, however i am not using it.

What you said about the large files might make sense, perhaps this is a case of them taking a small amount of time to show up once they are completed.

Thank you for clarifying about why different files might upload if I cancel and immediately restart, I hadn’t considered that.

I think my best bet for now is to be patient and let the sync finish and then try the same sync again shortly afterwards - if I start getting files that are known to be completed re-uploading, I will report back.

That sounds like a good plan :slight_smile: