--retries is resyncing all files instead of just the one failed

What I am trying to do is a sync of a directory for backup reasons.
The databases like *.nsf are already copied by another tool. But the remaining files should be synced to a remote drive.

The issue I am having is that some files are modified during backup like this file. I just need best effort copying files in this case. So a modified log file is not really an issue.
But retries of the whole sync would be problematic, becauses other files could be in use on retrying the whole sync.

2019/12/30 18:32:03 ERROR : console.log: corrupted on transfer: sizes differ 24169 vs 24375

According to the debug logs, the transfer starts over from the beginning if any error occurred.
I tried to ignore errors but the tries still occur.

What I have assumed is that the retries are done once the transfer ran into an error and not at the end of each sync.

When retrying other files might have been modified.
I found a work-around for skipped file deletion using --delete-before.

But having a way to ignore errors for updated files and having a direct retry per file and not for all files, would be very helpful.

I might have overlooked a flag which allows me do get this retry behavior?

Is there a way to configure a retry per file instead of the full sync?

Here is my current command and the version and platform I am using.

Thanks for any tip

Daniel

D:/rclone/rclone.exe sync d:/notestest d:/rclone/test1 --ignore-case --exclude ".{ns,ntf,box,lck}" --exclude "*.{ft}/**" --ignore-checksum --local-no-check-updated --ignore-errors --transfers 1 --delete-before --fast-list -v -v 2>&1 > synclog.txt

rclone v1.50.2

  • os/arch: windows/amd64
  • go version: go1.13.4
    Windows 10.0.18362.418

Rclone has two types of retries, low level and full retries. It uses both depending on exactly what kind of error.

I think if you set --retries 1 then it will probably do what you want - carry on doing low level retries but only do 1 try of the full sync.

Thanks for your quick reply!

I saw the low level retries, which is documented to be 10 by default. I just tested again setting the value explicitly to 10.
In combination with --retries 1 the file is not copied and you see the following log messages:

-- Daniel

2019/12/31 11:19:51 DEBUG : IBM_TECHNICAL_SUPPORT/nsd_W32I_NSH-T470P_2019_12_29@22_46_40.log: Unchanged skipping
2019/12/31 11:19:51 INFO : Local file system at //?/d:/rclone/test1: Waiting for transfers to finish
2019/12/31 11:19:51 ERROR : IBM_TECHNICAL_SUPPORT/console.log: corrupted on transfer: sizes differ 46632 vs 46838
2019/12/31 11:19:51 INFO : IBM_TECHNICAL_SUPPORT/console.log: Removing failed copy
2019/12/31 11:19:51 INFO : IBM_TECHNICAL_SUPPORT/logasio_NSH-T470P_2019_12_30@16_52_52.log: Copied (replaced existing)
2019/12/31 11:19:51 ERROR : Attempt 1/1 failed with 3 errors and: corrupted on transfer: sizes differ 46632 vs 46838
2019/12/31 11:19:51 Failed to sync with 3 errors: last error was: corrupted on transfer: sizes differ 46632 vs 46838

New command tested:

D:/rclone/rclone.exe sync d:/notestest d:/rclone/test1 --ignore-case --exclude ".{ns,ntf,box,lck}" --exclude "*.{ft}/**" --retries 1 --low-level-retries 10 --ignore-checksum --local-no-check-updated --ignore-errors --transfers 1 --delete-before --fast-list -v -v

I thought that was what you wanted?

The other way do it would be to exclude the file using the --exclude flag.

In general it is really hard for rclone to work out if a file is open in a cross platform way - so rclone throws the error if it notices that the file has changed during the upload. You can use this flag to stop rclone doing that which might be what you want

  --local-no-check-updated           Don't check to see if the files change during upload

I want to copy all files and I don't know which file will be busy in that moment.

Here is what I want in detail:

Copy all files in the best possible way (if they are open and change, I want retries and finally copy them if multiple retries failed).

I would like to see an error only if the retries did not work.
What I did is setting the retries to 1 but the low-level retries to 10.
And I expected 10 retries for the console.log file before giving up.
So the low-level retries seemed not to be tried. Should they show up in the log as retries?

I can't exclude files. Thought of not having log files included, but I need them to be copied with "best effort" (retries and finally if n retries per file did not work, just copy them and show an error or warning).

Although this does not address your problem directly, I have a very similar use case and have found the most reliable method is to handle the logic externally and then feed rclone a list of files with --files-from.

For example, I have a script that checks a directory tree recursively against a cache to pick up changes, checks to see that the files are not opened for writing, applies some other heuristics and then builds a list of files for rclone to copy to the backup target. The cache is only purged if rclone exits without an error, so the files get added again the next time the script runs if rclone throws the "file changed during transfer" error. Since rclone is smart enough not to re-copy files that have not been updated (which I find works best with the -c flag), it has the effect of picking up only the files in the list that failed during the last run.

With the exception of os-specific file operations (like locking and checking for open writes) everything is done with rclone using --include, --exclude, lsjson, etc.

Using the --local-no-check-updated flag will copy files as they are. If it is log files then you'll get a valid file just with a bit of log missing.

When a file is in use rclone marks it for a high level retry. Generally there is no point retrying an in use file immediately it will still be in use.

What you could do is do two syncs. One with --local-no-check-updated to copy everything with best effort and one without to make sure everything has been updated.

I have worked with --local-no-check-updated, which is in the command-lines I posted.
Still rclone did skip the file.

There are different reasons when I file is used.
In my case it is a log file where there might be a current update.
Other cases would be databases which are in use. So waiting for a longer time would make sense.

In my tests neither the --local-no-check-updated worked for me nor the low-level entries (see my log extracts).

I just want to sync everything with best effort. The only files in use should be log files.
Because I have a different way syncing the Notes Database files, which are also in this directory.

The main thing is that I want to get all the files copied. Even some are updated.

thanks for your suggestion. But I can't do that! I don't know which file will be in use.
The scenario is a backup of files. The database files (Notes Databases) are copied with a different solution.
I want to use rclone to take a backup of all other files. If something is in use a best effort file copy is totally fine.
But I want to have all files copied. Having a global retry that syncs everything again risks other files to be in use.

The main point is that the server is running and has files in use.
For databases there is a backup API. But for everything else I need a kind of open-file backup, which I am trying to implement with rclone.

-- Daniel

if you want all files to get copied, you might want to enable VSS.
check out my wiki, it might give you some ideas.

Hmmm... I think I found it. --local-no-check-updated makes a difference.

Here are my two tests:

a.) rclone.exe sync test backup --delete-before --retries 1 -v -v

b.) rclone.exe sync test backup --delete-before --local-no-check-updated -v -v

a.)
When I run this command it works as expected so far (still testing).

b.)
What did not work as expected is this command.
The low level retries don't happen in this case.

What also isn't as I would expected it is that an error occurs when the file is in use:

test.txt: corrupted on transfer: sizes differ 1652614 vs 1836843

Here are the detailed logs. I wrote a small test program, which modified the file test.txt

-- Daniel

2019/12/31 20:46:10 DEBUG : rclone: Version "v1.50.2" starting with parameters ["d:\rclone\rclone.exe" "sync" "test" "backup" "--delete-before" "--retries" "1" "-v" "-v"]
2019/12/31 20:46:10 DEBUG : Using config file from "C:\Users\nsh\.config\rclone\rclone.conf"
2019/12/31 20:46:10 INFO : Waiting for deletions to finish
2019/12/31 20:46:10 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for checks to finish
2019/12/31 20:46:10 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for transfers to finish
2019/12/31 20:46:10 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for checks to finish
2019/12/31 20:46:10 DEBUG : test.txt: Sizes differ (src 1050256 vs dst 682370)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 1/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 2/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 3/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 4/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 5/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 6/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 7/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 8/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 9/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : linux7.mak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : linux7.mak: Unchanged skipping
2019/12/31 20:46:10 DEBUG : mswin32.mak.bak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : mswin32.mak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : mswin32.mak: Unchanged skipping
2019/12/31 20:46:10 DEBUG : nshwrite.c.bak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : nshwrite.c.bak: Unchanged skipping
2019/12/31 20:46:10 DEBUG : mswin32.mak.bak: Unchanged skipping
2019/12/31 20:46:10 DEBUG : nshwrite.exe: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : nshwrite.exe: Unchanged skipping
2019/12/31 20:46:10 DEBUG : nshwrite.obj: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : nshwrite.obj: Unchanged skipping
2019/12/31 20:46:10 DEBUG : nshwrite.c: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:46:10 DEBUG : nshwrite.c: Unchanged skipping
2019/12/31 20:46:10 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for transfers to finish
2019/12/31 20:46:10 DEBUG : test.txt: Reopening on read failure after 0 bytes: retry 10/10: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 DEBUG : test.txt: Reopen failed after 0 bytes read: failed to reopen: too many retries
2019/12/31 20:46:10 NOTICE: test.txt: Removing partially written file on error: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 ERROR : test.txt: Failed to copy: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 ERROR : Attempt 1/1 failed with 3 errors and: can't copy - source file is being updated (size changed from 1050256 to 1133996)
2019/12/31 20:46:10 Failed to sync with 3 errors: last error was: can't copy - source file is being updated (size changed from 1050256 to 1133996)


2019/12/31 20:55:30 DEBUG : rclone: Version "v1.50.2" starting with parameters ["d:\rclone\rclone.exe" "sync" "test" "backup" "--delete-before" "--local-no-check-updated" "-v" "-v"]
2019/12/31 20:55:30 DEBUG : Using config file from "C:\Users\nsh\.config\rclone\rclone.conf"
2019/12/31 20:55:30 INFO : Waiting for deletions to finish
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for checks to finish
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for transfers to finish
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for checks to finish
2019/12/31 20:55:30 DEBUG : linux7.mak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : linux7.mak: Unchanged skipping
2019/12/31 20:55:30 DEBUG : mswin32.mak.bak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.exe: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.exe: Unchanged skipping
2019/12/31 20:55:30 DEBUG : mswin32.mak.bak: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.c: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.c: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.c.bak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.c.bak: Unchanged skipping
2019/12/31 20:55:30 DEBUG : mswin32.mak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : mswin32.mak: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.obj: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.obj: Unchanged skipping
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for transfers to finish
2019/12/31 20:55:30 ERROR : test.txt: corrupted on transfer: sizes differ 1652614 vs 1836843
2019/12/31 20:55:30 INFO : test.txt: Removing failed copy
2019/12/31 20:55:30 ERROR : Attempt 1/3 failed with 3 errors and: corrupted on transfer: sizes differ 1652614 vs 1836843
2019/12/31 20:55:30 INFO : Waiting for deletions to finish
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for checks to finish
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for transfers to finish
2019/12/31 20:55:30 DEBUG : linux7.mak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : linux7.mak: Unchanged skipping
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for checks to finish
2019/12/31 20:55:30 DEBUG : mswin32.mak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : mswin32.mak: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.c: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.c.bak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : mswin32.mak.bak: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : mswin32.mak.bak: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.exe: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.exe: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.obj: Size and modification time the same (differ by 0s, within tolerance 100ns)
2019/12/31 20:55:30 DEBUG : nshwrite.obj: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.c: Unchanged skipping
2019/12/31 20:55:30 DEBUG : nshwrite.c.bak: Unchanged skipping
2019/12/31 20:55:30 INFO : Local file system at //?/N:/notesapi/work/nshtest/nshwrite/backup: Waiting for transfers to finish
2019/12/31 20:55:30 DEBUG : test.txt: MD5 = df50b8532af22128787bf44392fc4c51 OK
2019/12/31 20:55:30 INFO : test.txt: Copied (new)
2019/12/31 20:55:30 ERROR : Attempt 2/3 succeeded
2019/12/31 20:55:30 INFO :
Transferred: 3.503M / 3.503 MBytes, 100%, 140.142 MBytes/s, ETA 0s
Errors: 0
Checks: 14 / 14, 100%
Transferred: 1 / 1, 100%
Elapsed time: 0s

2019/12/31 20:55:30 DEBUG : 3 go routines active
2019/12/31 20:55:30 DEBUG : rclone: Version "v1.50.2" finishing with parameters ["d:\rclone\rclone.exe" "sync" "test" "backup" "--delete-before" "--local-no-check-updated" "-v" "-v"]

I think that is probably work as designed.

--local-no-check-updated just turns off the checking for the changes. It looks like the file changed size so the sync layer then complained about that :frowning:

So I think the answer to your question is that rclone can't resync one file if it is open without doing a full retry at the moment.

In general syncing files that are open is a hard problem to solve because rclone doesn't know when the file will be closed so can't schedule a retry when the file is closed.

You could try using --retries 3 (the default) and --retries-sleep to put more time between the retries

  --retries-sleep duration   Interval between retrying operations if they fail, e.g 500ms, 60s, 5m. (0 to disable)

If you are running on windows then using a VSS (volume shadow copy) is a good solution. If running on linux you could take an LVM snapshot (for example).

without the --local-no-check-updated option the retry is done per file with the low-level retry. this isn't used when this option is specified. So your information is helpful to understand why. The error that the file is changed happens on a lower level as you say and this is past the check if the file changed. So in that case if a lower level error occurs, there is no local retry. The local retry is what is important for me.

What would be helpful is n local retries and if that doesn't work just copy the file anyway with best effort and provide a warning or error.

Right now if after n retries the file cannot be copied correctly, it is removed on the target.
The full resync isn't helpful in my case, because other files could be in use at that point.

So the low-level retry is what I need and what works without --local-no-check-updated

The ideas of VSS or LVM snapshots are good! But too much efforts for just log-files failing during backup.
My backup solution takes care about Notes/Domino database backups using functionality from the application to keep files consistent. Those operations are done per database and I am just leveraging normal copy operations.

So rclone just is a backup of the remaining data - that's what I use the excludes for.
For what I am doing with those files rclone is the perfect tool. I am mirroring those remaining files to a remote location.

Thanks you for all your help!

-- Daniel

Ah, I'd forgotten about that.

I've been thinking about changing that logic - it doesn't work on most cloud providers anyway - it only really works on the local and the sftp backend.

So currently what rclone does is if a copy is corrupted, it deletes the destination file.

What do you think about changing that?

It would make sense to have a way to configure to keep the file.
But if this has been behaving like this for a while, I would keep the current behavior the default.

Also having the documentation update for others to understand the current logic better.

It took me a while to understand the two different concepts of local retry and the full retry loop.

-- Daniel

OK I put that on the to do list :slight_smile: