Hi all, I recently needed to move all my files off my ZFS NAS due to that I needed to create a new vdev with ashift 12 instead of 9.
Anyhow I moved all the files/directories via rsync (over ssh), and then back again on the newly created vdev/pool. Now I’m trying to run rclone again (using gdrive as backend), but it seems like it sees all the files as new even though I’ve tried to keep all the modification times, etc.
Is there a smart way to get it to skip existing files?
By default rclone should only check the file size and modified time.
So if you preserved the modified times it should treat the files equal.
Can you provide a short list of files with size and modified times on the source and destination that rclone tried to copy again?
This would help finding the issue.
You can also try to increase the log level and I think rclone will print the reason why it copies the files.
2018/07/27 18:36:21 NOTICE: storage/tv/redacted.720p.X264/redacted.S02E10.720p.x264.mkv: Not moving as --dry-run
2018/07/27 18:36:21 NOTICE: storage/tv/redacted.720p.X264/redacted.S02E10.720p.x264.mkv: Not copying as --dry-run
I guess everything should be good if I could get rclone to ignore the milliseconds. Any way to do that?
EDIT:
It seems that I can add “–checksum” flag and get the correct behaviour, but I guess that will take huge amount of time and resources. Or am I wrong?
2018/07/28 02:44:48 DEBUG : redacted.S02E10.720p.x264.mkv: Size of src and dst objects identical
2018/07/28 02:44:48 DEBUG : redacted.S02E10.720p.x264.mkv: Unchanged skipping
I still think the best thing would be to be able to skip the milliseconds from the modification time.
EDIT2:
And I just found out about --modify-window. That should help. Trying to set it at 500 ms, and try a dry run of all files.
Or is --checksum the way to go? I’ve seen a few examples of people using it. I also read somewhere that it’s not supported on encrypted drives – even though it seems to work for me?
For non crypt remotes --checksum is is the most reliable way for rclone to check for file equality. Many remotes provide some kind of hash value in the file metadata, without needing to compute the value again (see Overview of cloud storage systems for supported hash methods).
For local filesystems the hash value is always computed when needed.
With crypt remotes --checksum does not work and rclone will silently skip the hash comparison and only compare file sizes (using --checksum on crypt is equal to using --size-only).
--modify-window is probably your best option to only copy missing or updated files again. You just need to remember to set it for all sync, copy or move commands, since rclone will not update the modified-times on the destination.
I don't think rclone has a method of updating modified-times during sync or move at the moment.
Doing a rclone move -c a b for non crypt remotes does not update the modified times for equal files either.
Being able to update the modified times in situations like yours would be very useful.
If I understand correctly, I would probably use this just to refresh the times in this case. I would prefer to run a sync with --referesh-times and --size-only (due to it being encrypted) rather than using --modify-window every time from now on.
The weird thing is that I had no issues with the timestamps before I moved the files with rsync. And rsync was set to preserve modification/created data – and it did. It matches up with the files in Google Drive, except for the milliseconds. But I didn’t think Linux timestamped files with ms anyway.
So in my mind everything should be identical to “before the move”, but apparently not since I now need to use the --modify-window to make sure files are not unnecessary reuploaded.
OK here is my attempt at some docs for the new feature. It is complicated! Any thoughts?
–refresh-times
The --refresh-times flag can be used to update modification times of existing files. This is useful if you uploaded files with the incorrect timestamps and you now wish to correct them.
This can be used any of the sync commands sync, copy or move.
When rclone comes to upload a file it will check if there is an existing file and if it matches with size (no sync flags or --size-only) or size and checksum (--checksum) then instead of re-uploading it, rclone will update the timestamp.
Note that some remotes (eg amazon drive, b2, dropbox, mega, pcloud, webdav) can’t set the modification time without re-uploading the file anyway so this flag is useless with them.
If you are doing a modification time sync (ie not using --checksum or --size-only) rclone will update modification times without --refresh-times provided that the remote supports checksums and the checksums match on the file. crypt remotes do not support checksums so to update times on a crypt remote you will need to use --refresh-times.
I did an attempt trying to be more concise – English is not my first language so be warned.
--refresh-times
The --refresh-times flag is used to update/sync modification times of existing files between source/remote. This is useful if you have files with incorrect timestamps and you now wish to correct them to i.e avoid re-uploading the same files.
This can be used in any of the sync commands – sync, copy or move.
Prior to rclone uploading a file it will first check if it exists on the remote and if it matches in size (no sync flags or --size-only) or both in size and checksum (--checksum). If the file exists rclone will just update the timestamp instead of re-uploading.
Note that some remotes (eg amazon drive, b2, dropbox, mega, pcloud, webdav) can’t set the modification time without re-uploading the file anyway so this flag is useless with them.
If you are doing a modification time sync (ie not using --checksum or --size-only) rclone will update modification times without --refresh-times provided that the remote supports checksums and the checksums match on the file. Crypt remotes do not support checksums so to update times on a crypt remote you will need to use --refresh-times.
I'm not totally sure if I understand this. Does this mean that as long as you are not using crypt rclone will update the modification times? Does that mean that "--refresh-times" is not needed on non-crypt remotes? Hence maybe you should just include a --size-only option in the --refresh-times flag? Since it's enabled by default.
(if you understand what I mean)
I’ve read the feature request log and this thread. It leaves me scratching my head. Who if anyone (and why) did ever say rclone would update the modified time on a file without copying it on any version? There’s nothing anywhere in the docs that eludes to this. This seems to be a myth. Throw all the stuff about crypt out please as that’s just confusing things even more.
Basically as I understand it, if rclone detects there’s a dupe on the destination (no matter what away, checksum, size, date, etc - wharever you configure it to use for dupe detection), it IGNORES that file.
IGNORES = DOES NOTHING! Which includes copying the file or modifying any files anywhere whatsoever.
Yes, I just discovered this option (I’m new to rclone)
–no-update-modtime Don’t update destination mod-time if files identical.
Here’s the question though, can you get it to update the mtime on the destination file if the mtime on the sourve is OLDER. I used this multicloud service and it copied files and set the mtime the date they were copied not the mtime of the source file!
Seems like the -c option (skip on checksum and size) would work? But I’m guessing rclones internal rules are that it won’ t modify the mtime on the dest if it’s newer than the source!