Option to retry a single failed file instead of the whole sync

What is the problem you are having with rclone?

Hi, I am using rclone to backup local files to an FTP server, and it mostly works fine. There are about 62k files to check by size.
The problem is that sometimes there are some intermittent errors in the communication with the server, despite the fact that I have set low-level-retries to one million. The two most common are:

  • Couldn't move: Move NewObject failed: object not found
  • error reading destination directory: 425 PASV: Address already in use

The first one appears at the start only, and does not occur on the second retry.
The second error appears 2-5 times on each full retry, each time at different files/directories.

This causes each full retry to fail for these random intermittent failures. Perhaps if I set the retries to 50 it would do a full cycle once without any error, but it would take too long (each full cycle is ~40 minutes).
But my main issue is that the full retry is a waste of time and bandwidth in this case, because only a few files are failing. So rclone could just retry the failed files only, not everything over again, and that would take max 30 seconds (these are checks that fail, not transfers).

If it is an issue remembering which files failed, because of memory needed if there is a large amount of them, there could be a limit of remembered files to retry (configurable), and if it's reached, the full retry cycle could be forced.
Also if any of the failed file fails again with retry, it could be put to the list again (or just not removed from the retry list), and retried until all succeed or file retries limit is reached.

I don't know why low-level-retries do not fix the issue, perhaps there are some 1-2 second connection issues, and low-level-retry does not wait between the attempts, and makes the million trials within fraction of a second.

What is your rclone version (output from rclone version)

I am using a 1.55.1 version with modifications without which my FTP server sill not function (I will use the latest official when the modifications are implemented in it).

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows 10, 64-bit

Which cloud storage system are you using? (eg Google Drive)

FTP with crypt

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone.exe "copy" "D:\" "**private**" "--backup-dir=**private**" "--config=**private**" "--password-command=**private**" "--filter-from=**private**" "--size-only" "--bwlimit=5M:50M" "--retries-sleep=10s" "--ftp-concurrency=1" "--transfers=1" "--retries=1" "--low-level-retries=1000000" "--max-backlog=100000" "-vv"]

The rclone config contents with secrets removed.

[**private**]
type = ftp
host = **private**
user = **private**
port = 6621
pass = **private**
explicit_tls = true
disable_mfmt = true
disable_tls13 = true
encoding = Slash,Asterisk,Ctl,Dot
concurrency = 3

[**private**_crypt]
type = crypt
remote = **private**
directory_name_encryption = false
password = **private**

A log from the command with the -vv flag

Below is a fragment of the debug log showing the problem. Interestingly, the manual says with the -vv option rclone should print all low-level retries, but there don't seem to be any here.

2021/11/18 15:54:17 DEBUG : Programowanie/Lazarus-Win/HAWK/src/components/graphics32-master_MOD/Documentation/Source/Units/GR32_Backends/Interfaces/IDeviceContextSupport/_Body.htm: Sizes identical
2021/11/18 15:54:17 DEBUG : Projekty/MartinoEscapeRoom/magic-rooms-repo/controller/sources/soft_demobrd_v1.0_F746G/Debug/Middlewares/Third_Party/LwIP/src/core/ipv6/icmp6.o: Excluded
2021/11/18 15:54:17 ERROR : Programowanie/LBA/LBArchitect/Components/graphics32/Examples/Vcl/Resampling/Resamplers_Ex: error reading destination directory: 425 PASV: Address already in use
2021/11/18 15:54:17 DEBUG : Programowanie/LBA/LBArchitect/Components/graphics32/Examples/Vcl/Resampling/PixelF_Ex/createbundle.sh: Unchanged skipping
2021/11/18 15:54:17 DEBUG : Programowanie/C-STM32/DISCO_F746G/zabawa/zabawa/SW4STM32/zabawa/Drivers/Components/st7735/Release_Notes.html: Sizes identical

That is probably the issue to work on.

Low level retry only retries errors it things are worth retrying and it might be that there is an error being thrown which should be retried but isn't.

I think lots of FTP fixes went into 1.57 - have you tried that?

Are you transferring many, many files?

Perhaps you are running out of local ports on the server - can you increase the passive port range there?

Perhaps rclone should treat that 425 error as a retrieable error?

I'd send you something to try, but I need to know whether 1.57 or the latest beta works for you!

Thanks for the answer. Unfortunately I cannot use 1.57 or the latest beta, because they still do not support disabling the MFMT instruction (my server does not support ModTime).

2021/11/18 18:04:42 ERROR : *filename*: Failed to copy: SetModTime: 501 *filename*: Operation not permitted

The modified version I use has configuration flag disable_mfmt, which makes rclone usable for me. It is this version:

Are you transferring many, many files?

Yes, almost 62000.

Perhaps you are running out of local ports on the server - can you increase the passive port range there?

Not sure if I can do that, but I will try.

I just found some topic on another problem, that gave me an idea: I could modify the program I use to run rclone, and catch the failed files in the log output, put them on a list, and when rclone finishes, run rclone separately fro each file from the list. That would require some realtive path anlysis, but would be doable.

My thoughts about future changes in ftp backend

reduce ftp concurrency, it's 4 by default

Here you tell you found a workaround

Hearing that I didn't merge the quirk flag for mfmt.

I already use 1, it's in the command I pasted in the first post.

If you check my post you are quoting, you will find that I edited it and withdrew the workaround, because I have been wrong, and final conclusion is that it doesn't work. MFMT disable option is still needed for my case..

@ncw I have made some tests with the ports usage, and I found out the following:

  1. There are 10 ports open for FTP connections on the server.
  2. Rclone uses 1 of them, max 2 at the same time in my case.
  3. Rclone seems to close and open FTP connection for each file it checks or transfers, which is suboptimal. Why does it do that?
  4. The above results in the higher ports being used, despite the fact the lower ones have already been closed. I suppose this is because sometimes the server has too little time to free the port after closing a connection, before another connection is incoming. The server is not very efficient, as it is built on a Raspberry-Pi.
    Perhaps increasing the number of ports would help, but it's not easy to do in the current setup. I would go this way as a last resort.
    I tried the sync on a more reliable internet connection (and slower, so I had to use lower bandwidth limit), and the sync went twice without any error, so perhaps this issue is caused by a lost connection closing packet, and the server goes out of ports it can use for new ones. Or a lower bandwidth limit caused slower new connection opening, so the server could keep up.
    I will be able to try lower bandwidth limit on my faster internet later today, and if that works fine, I will be happy with that workaround. But it should be taken into consideration to make rclone use a single connection for all FTP transfers. I think that would speed it up greatly in cases like mine, where most of the file operations are checks, and only a few transfers need to be made.

That is how the FTP protocol works. Files are transferred in a single TCP connection. As are directory listings.

Perhaps easier, do a sync then use rclone check to show you exactly which files need uploading. This can make a file suitable for feeding back into rclone with the --files-from flag.

Transfers yes, but for checks too?

Thank you for the idea, but it would double the time of entire operation. I think I will stick to mine.

I did the test with a lower bandwidth. When I decreased upload speed limit to about half of my internet bandwidth, the errors disappeared. It's not a big problem, because the limited transfer is still fast enough for my needs. And the total time of regular sync of 62000 files (with 99.9% checks and only a few transfers) is the same as before.

Now I wonder... I know the internet bandwidth at the server's side is lower than mine. Could that be the cause of the problem? Now after the speed limit decrease my bandwidth should fall below the server's bandwidth.

The listing of each directory will is also a new tcp connection I think - that would probably explain what you are seeing.

I did the error log parsing, and retrying single failed files and directories, but this caused another problem: the filters are not applied correctly in the retry because it uses a different base path.

I am using somewhat complex filter-from rules. I need to because in one case I need to exclude an entire directory, but include some particular files from it.

My filter pattern looks like this:

# Include org.eclipse.cdt.ui.prefs file
+ **/*Workspaces/*/.metadata/.plugins/org.eclipse.core.runtime/.settings/org.eclipse.cdt.ui.prefs
# exclude Eclipse workspaces
- **/*Workspaces/**
# include everything else
+ **

Now for the base rclone run this works perfectly. But if the D:\EclipseWorkspaces\projectname\.metadata\.plugins directory fails to check, and I do the retry, the only way I could do this is by calling rclone copy D:\EclipseWorkspaces\projectname\.metadata\.plugins remote:.... But in this case the filter pattern - **/*Workspaces/** will not match, and the whole directory will be included, which I do not want.

I am trying to think a way to do this reasonably (I.E. without adjusting the filter for each retried base directory, which would require writing my own filter interpreter), but I can't find any.
The most reasonable would be if I could specify particular files or directories to be copied, with the original base directory, and original filters, but the manual says filter-from and files-from cannot be used together.

@ncw, I remember your suggestion to use rclone check first and then pass the result as a filter-from rule, but that will not work in my case, as most of the errors I experience occur while checking, so I would end up with the same problem (retry the errored checks would avoid my filter rules because of the different base directory).

Just now I am thinking about another possibility. For retries I could use a copy of my filter file with the final +** removed, and after that specify another filter file compiled at runtime, which would contain my errored files and directories, and -**. This would pass the files through the regular filter, and then through the second filter for individual files.

Do you have a better idea?

I made my last idea, that is:

  1. I have removed ending + ** from my standard filter file,
  2. I record failed files and directories,
  3. I have added in my wrapper program a condition, that adds a second filter file depending on the situation:
  • if it is a base run, the second filter file consists of just + **,
  • if it is a retry run, the second file consists of all failed directories and files (of course with + in front of them), and - ** at the end.

This way the retry run checks only the failed items, and all normal filters still work. There is one drawback though.

I cannot use relative inclusion filters any more. If I do, on the retry run rclone will check all the directories again, not only the previously failed ones. It is reasonable that rclone enters all directories, but I don't understand why it checks them.

For example: if I have the main filter file like this:

+ **/somedir/somedir2/somefile.txt
- somedir/**

(to exclude somedir, but include somefile.txt, that may reside in that dir),
rclone will, on the retry run, check all of the unrelated directories, like C:\dir1, C:\dir2, and so on.
I understand why it will traverse them - because there might be somedir inside one of them, that may include somefile.txt, but I don't get why it has to check all that directories, I.E. ask the server about them. I know it does, because it sometimes fails and I get the endless retry loop with some random failed checks in each iteration.

For my understanding rclone should just skip each unrelated file in each directory if it is not included in any filter line (I.E. when the - ** matches with the file). But instead it checks its directory with the server. What for?

If I convert my relative path inclusion in the filter file to absolute ones, like this:

+ /somedir/somedir2/somefile.txt
+ /dir10/somedir/somedir2/somefile.txt
- somedir/**

then rclone will work fine, and finish the retry cycle very quickly, and without errors (obviously because it won't check all the unrelated directories any more).

This is acceptable, but quite tedious, as I have many somedir subdirectories and somefile.txt files in various places, and I have to repeat the inclusion for all of them.

Is there another way I could avoid these superfluous checks?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.