I'm trying to understand what --update does. It's likely I'm misinterpreting the documentation, and would like help in seeing where I'm going wrong. The documentation says:
This forces rclone to skip any files which exist on the destination and have a modified time that is newer than the source file.
Also:
If an existing destination file has a modification time equal (within the computed modify window precision) to the source file's, it will be updated if the sizes are different. If --checksum is set then rclone will update the destination if the checksums differ too.
If an existing destination file is older than the source file then it will be updated if the size or checksum differs from the source file.
In my case below, rclone sync does not copy dir/a, which was touch'ed. However, rclone --update sync does copy it.
This is confusing, as the documentation implies that --update will only skip files in comparison to not using this option. Is this not the case?
How is it that dir/a is copied over based on its modtime being newer, even though the filesize hasn't changed? The documentation above implies it shouldn't have been copied if I'm reading this right.
Run the command 'rclone version' and share the full output of the command.
The solution to the mystery lies in the modify window. I suspect the webdav you are using doesn't support modification times.
You can check this with
rclone backend features wdav: | grep Precision
This will probably come out with a very large number which indicates that for syncing purposes rclone ignores the modification times on files. This is because rclone can't set the modification times.
This is exactly the case that --update is useful. The times shown on the server will be the times that they were uploaded, rather than their modification times, and using --update with this kind of backend works well.
Perhaps the documentation could be better worded, but note that "not skipped" means the file will be transferred.
It does.
Looking at the code, that isn't what it does. Since your backend doesn't have checksums it assumes that the files must be different if the source timestamp is newer and the sizes are the same.
Most backends have either timestamps or hashes so I guess that is why this hasn't been noticed before.
I guess the question is, is the current behaviour good, or not?
The current behaviour seems pretty safe - as it it is better to transfer something if we really don't know whether it has changed or not but we have a strong signal that it has changed or needs uploading because of the timestamp difference.
If it is good we can correct the docs, if not correct the code!
Thanks a bunch for the detailed response! You're right on all counts:
the backend indeed doesn't support mod times (nor supports checksums), and instead shows upload times. This is exactly why I was using --update in the first place. I should've mentioned that
Just FYI, the webdav backend is Sharepoint, provided by, I believe, an Office365 Education Account
agreed, the current behavior is both safe and exactly what was desired - at least for this case for me. agree, it's the documentation that needs to be updated, not the code
the only oddity I noticed was, there is a multi-second difference between the local and remote clocks, even after sync'ing the local clock via ntp right before the upload. This leads to interesting behavior when sync/copyingn too soon after the last sync/copy. But this is not rclone's fault, and I'm not sure is worth the complexity of handling. Perhaps it's worth mentioning as a gotcha in the documentation
2022/01/28 09:41:54 INFO : e: Copied (new)
...
$ rclone lsl wdav:/dir/e
0 2022-01-28 09:41:46.000000000 e
If an existing destination file has a modification time equal (within
the computed modify window precision) to the source file's, it will be
updated if the sizes are different.
This is correct, however it isn't the whole story..
If --checksum is set then
rclone will update the destination if the checksums differ too.
This is also correct.
So I think we need a bit more documentation here... What the code actually does is
If --checksum is not set then rclone compares the checkums anyway and will upload the file if the checksums differ or the backend can not provide any checksums.
Which suggests this as a new doc paragraph
If an existing destination file has a modification time equal (within
the computed modify window precision) to the source file's, it will be
updated if the sizes are different. If the sizes are the same and the
checksum is different or not available then rclone will update the
destination.
Does that seem OK to you?
I note the comments in the source code are wrong too, so I'd say this algorithm changed at some point but the docs/comments never got updated.
I think this change was the one which introduced the doc skew
commit f3b0f8a9f06b4e3407ea04c37e23b1ebee34cc79
Author: Nick Craig-Wood <nick@craig-wood.com>
Date: Sat Jun 8 14:08:23 2019 +0100
sync: --update/-u not transfer files that haven't changed - fixes #3232
Before this change --update would transfer any file which was newer
than the destination regardless of whether it had changed or not.
This is needlessly wasteful of bandwidth.
After this change --update will only transfer files if they are newer
**and** they are different (checked with checksum and size).
Which is quite clear to its intention - it just forgot to update the docs!
Normally what you'd do is increase the --modify-window to compensate. However since your backend doesn't support checksums, I don't actually think that will help.
I think the worst effect will be that it transfer files twice, so not a disaster!
including the "destination file has a modification time older than source" case makes it even clearer at the expense of just a couple of words. However, I didn't check if that is what the code does (I'll leave this to you)
Hinting at the intention behind --update (as you commit message does) would make both the use case, and the following logic far more clearer IMHO.
Since the primary use case is backends which don't support mod time directly, I'd suggest moving that up front.
Seeing occasional multiple transfers when there is a time skew might make sync look erratic before the user figures out what is happening. Since this is unique to backends that use upload times, and since that is the primary use case for --update I'd suggest mentioning it to users who may trip on it.
Here's a version incorporating all the above if it helps. Of course, feel free to take or leave any/all of it
This forces rclone to skip any files which exist on the destination and have a modified time that is newer than the source file.
This can be useful in avoiding needless transfers when transferring to a remote which doesn't support mod times directly (or when using --use-server-modtime to avoid extra API calls) as it is more accurate than a --size-only check and faster than using --checksum. On such remotes (or when using --use-server-modtime) the time checked will be the uploaded time.
If an existing destination file has a modification time older than or equal (within the computed modify window precision) to the source file's, it will be updated if the sizes are different. If the sizes are the same, it will be updated if the checksum is different or not available.
Consider using the --modify-window to compensate for time skews between the source and the backend, for backends that do not support mod times, and instead use uploaded times. However, if the backend does not support checksums, note that sync'ing or copying within the time skew window may still result in additional transfers for safety.
Thanks, and thanks again for writing and sharing rclone. It's made my cloud interactions massively easier!
I re-read the code again, and realised I'd got the cases the wrong way round, so I've taken your docs and tweaked them with the new info.
OK?
-u, --update
This forces rclone to skip any files which exist on the destination
and have a modified time that is newer than the source file.
This can be useful in avoiding needless transfers when transferring to
a remote which doesn't support modification times directly (or when
using --use-server-modtime to avoid extra API calls) as it is more
accurate than a --size-only check and faster than using --checksum. On such remotes (or when using --use-server-modtime)
the time checked will be the uploaded time.
If an existing destination file has a modification time older the source
file's, it will be updated if the sizes are different. If the sizes
are the same, it will be updated if the checksum is different or not
available.
If an existing destination file has a modification time equal (within
the computed modify window) to the source file's, it will be updated
if the sizes are different. The checksum will not be checked in this
case unless the --checksum flag is provided.
In all other cases the file will not be updated.
Consider using the --modify-window flag to compensate for time skews
between the source and the backend, for backends that do not support
mod times, and instead use uploaded times. However, if the backend
does not support checksums, note that sync'ing or copying within the
time skew window may still result in additional transfers for safety.