Max-backlog usage and better estimates

What is the problem you are having with rclone?

I'm doing some large sync operations on config type local (specifically between two distinct NAS mounts on a Linux server. I found this topic that cleared up my confusion about why the estimates were way off and kept creeping up as the sync proceeded.

My questions are:

  1. If I know the number of files I'm transferring, is it safe to set --max-backlog to that number?
    1a. What if it's 5 million+ files?
    1b. What are the implications on memory used?
  2. (This is more of a feature request and I'll post there too if I don't get any traction here) Could rclone be modified to give a more accurate estimate regardless of the --max-backlog flag? ie if the tool worked in the background to recursively estimate the total size and file count while a transfer was running and the estimate got more accurate as this progressed, that would make the ETA and percentage much more useful.

Command

rclone -P -l --transfers=12 sync local:/mnt/nas1/dir1 local:/mnt/nas2/dir1

after 25 seconds

Transferred:       13.812 GiB / 3.904 TiB, 0%, 573.749 MiB/s, ETA 1h58m29s
Checks:              2922 / 2922, 100%
Transferred:          205 / 10219, 2%
Elapsed time:        25.5s

after 25 hours

Transferred:       55.018 TiB / 55.823 TiB, 99%, 384.656 MiB/s, ETA 36m35s
Checks:              5340 / 5340, 100%
Transferred:       371124 / 381143, 97%
Elapsed time:  25h43m35.5

Version

# rclone version
rclone v1.57.0-DEV
- os/version: rocky 9.2 (64 bit)
- os/kernel: 5.14.0-284.30.1.el9_2.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.17.2
- go/linking: dynamic
- go/tags: none

Thank you for the help and for making this forum so useful! Rclone has been an incredibly helpful tool for me.

Objects take roughly 1k of memory to store, so 5 million files will take roughly 5G of memory to store the backlog.

Rclone woud have to scan the source and destination directories twice to do that which probably doesn't matter for local NAS mounts, but would matter a great deal for remote cloud storage.

I would (and do) set --max-backlog -1 - give that a go and see whether rclone uses too much memory. You can also add --check-first which will make rclone work out exactly what needs to be done first before starting any transfers. You'll then get 100% accurate estimates! --check-first is also kinder to HDDs.