So did something a bit daft. A SnapRaid script I have was whining about there being heaps (as in over 100k) of files with a zero modification time. Without thinking I foolishly ran the SnapRaid touch command to put a timestamp place holder on these files so SnapRaid could better determine if files were new or modified. As a result my rclone script is now trying to upload multiple terabytes worth of data back up to my GoogleDrive for data that still resides on my NAS.
From my take, this is going to be an ongoing consideration for using SnapRaid on my NAS with rclone as by cloud backup solution. Two questions really;
Would using the checksum option with my sync command remove the dependence on filesize and modification time checks for rclone?
Assuming yes to above, how would I go about getting the checksums in place for files already on my GoogleDrive? I'm hoping I can start doing this without having to re-sync 28TB or so of data.
I'm even more lost now. I was sure it was the SnapRaid touch that was causing this. Anyway, ran my sync script with the --dry-run & -vv switch and it found 376,000+ files with modified timestamps, however they were all under the 1ms threshold time.
Only one log entry I could see of interest was at the end;
Thanks again for the assistance. I'm really not sure what is going on here, it seems as if rclone believes a load of files have been deleted. I use the --backup-dir option with sync to ensure nothing ever gets truly deleted from the cloud side. Below is my complete command line I've been running on the same Windows 2012 server for almost spot on a year now.
Using the "Get size" feature on Rclone Browser I can see from my archive directories that from the 13th of this month my script started archiving off massive amounts of data. Of the 5 archive folders I can see, just shy of 3.7TB of data has been archived as per the --backup-dir option, however of the several dozen files I've spot checked they are 100% still on my NAS and are definitely no longer present in my main sync directory. Unless I'm mistaken, this means rclone thinks the files have been removed from the source path?
Two things I can think of have happened in this time;
I upgraded my NAS from OMV v3 to v4
I did actually deleted a load (1.7TB max) of media files and 10's of thousands of old email files from my NAS to reclaim some space.
Re-filtering my dry run log from earlier I can see there seems to be a ton of entires for moving and copying files. A few things I notice;
At the very top of the log I can see all the new data that has been created on my NAS that has yet to be sync'd. These all have a single entry with a status of "Not copying as --dry-run"
There are 212k entires in total, 121k are for files that exist in directories that have been sync'd for some time and from what I can tell still exist on my NAS. I absolutely would never intentionally delete these files as they are 15yrs+ worth of photos and video from trips and family holidays and the like.
Looking through these 121k of files I see there are 2 entires for each file. First one says "Not moving as --dry-run", followed by "Not copying as --dry-run".
Scrolling towards the bottom of the log I see all the entires for the files I did actually delete from my NAS which have a status of "Not moving into backup dir as --dry-run"
Thanks, I'll give it a try. Do I need to have have --checksum also in place? I haven't used --track-renames, sounds like something I really should have in place regardless given the number of files I am syncing.
Edit
So I added in the --checksum option and I am seeing this log entry in a dry run.
NOTICE: Encrypted drive 'GCrypt:/SecBackup': --checksum is in use but the source and destination have no hashes in common; falling back to --size-only
Is there anyway I can get rclone to generate the checksum on local and remote without having to upload again?
Thanks for that clarification. I updated to rclone 1.52 and any currently doing a dry run with --track-renames set with the --track-renames-strategy set to modtime rather than using hash. Will see how it goes.
Sorry I forgot you were using crypt... You'll also need the --track-renames-strategy modtime flag so rclone checks the renames using the modtime rather than the hash.
Not sure if it was a change from upgrading rclone 1.51 to 1.52, or from running the track changes option. Regardless, in the last dry run I ran there were a LOT of entires showing a modified time greater than the 1ms rclone default threshold. I found this on the SnapRaid FAQ page;
Sets arbitrarely the sub-second timestamp of all the files that have it at zero.
This almost puts me back to my original thoughts of all these files being archived off because of the SnapRaid touch command. Supporting this is that from my last dry run, all these new greater than 1ms mod times were under a 1s.
I have now added --modify-window=1s to my sync script along with the track-changes option and set a backup going last night. So far after 12 hrs it has sync'd 144GB of which 439 photo related files totalling 1GB that have been archived that I am not convinced yet should have been. I do notice that the sync script has reported 439 renamed files which I am wondering if is connected. I'll need to look into the logs more closely in 4 days or so when it has completed the 1.185TB in the queue.
Question about rename tracking though. I take a LOAD of photos doing time lapse sequences, as in 10's of thousands. I have several Sony cameras which all use the same naming convention of DSC12345. RAW files from these cameras will always be identical is size and eventually they will cycle around and I will have files with the same name and filesize. Mod time should still be different, is there any significant risk of just relying on mod time in this scenario?
OK that makes perfect sense as a root cause of the problem - setting the sub second times would cause rclone to think the files have changed.
You are relying on the size and the modtime. So the files would have to be the same size and the same modtime to get confused. Rclone is ignoring the leaf name here.
Track renames only comes into place if you do actually rename or delete stuff - the way it works is that normally rclone would delete things that are surplus on the destination, but when using track renames it keeps a note of all the things it would delete and then tries to match them up with incoming files to save transferring them. So if you aren't in the habit of deleting stuff then there won't be any opportunity for track renames to do anything.
I suppose I'm just trying to say that it is only renamed/deleted files which will be checked and rclone checks the size and the modtime.
Do you have photos which might be the same size? What about the same modtime?
Uncompressed RAW photos from the same camera will always be the same size. Modtime should be different though, about the only scenario I can think this would present an issue is if I were to copy all my data from one NAS to another without preserving timestamps. Even then, the likelyhood of two RAW photos from the same camera (same size) being copied at the same time I'd think is pretty slim.
Anyway, seems to be all sorted now. Lesson being, if you are using SnapRaid you need to set --modify-window=1s in your conf so that the SnapRaid touch command does not send rclone into a spin.