Strageties for speeding up rclone sync times

I am syncing directly to a remote and experimenting with creating a real time /home folder sync. Currently it takes 2m15s to run an rclone sync command and check all of my files.

What are some strategies for flags to safely speed up this time? I assume that I do need to check the mod times and checksums every time to know if the file needs to be updated.

For example, if I increase --checkers from 8 to 50 the sync time decreases by half. Is this a safe with pCloud move or could it lead to getting throttled?

For others who are using pCloud, what is the safest number of checkers and other performance based flags that you use which do not lead to throttling? Is any of this information published anywhere that I can reference?

really, the only way to know, given your unique situation is to test and tweak.
the total time is so small, it should be easy to do.

what is the current command?

Here is my current command, this is just testing.

rclone sync \
        --dry-run \
        --log-file log.log \
        --log-level DEBUG \
        --progress \
        --transfers 1 \
        --checkers 8 \
        --contimeout 60s \
        --timeout 300s \
        --retries 6 \
        --low-level-retries 10 \
        --stats 1s \
        --stats-file-name-length 0 \
        --fast-list \
        --exclude-from Exclude.txt \
        --filter-from Filter.txt \
    /path remote:

So far, I have tried adding:

--size-only = 10s better
--checkers 50 = 45% better

Any other flags you would suggest I try?

Also, is it dangerous to increase checkers and other flags that could increase my performance. For example, could I increase checkers= 1000 or would I risk getting throttled with pCloud?

Does setting these flags allow me to exceed the API transaction limit or does rclone's pCloud remote implementation already limit the number of transactions I am allowed?

the only way to know if you are getting throttled is to use a debug log

is there a reason for this to be so low
--transfers 1

not sure it matters but why both of these instead of just one flag?

--exclude-from Exclude.txt
--filter-from Filter.txt

I am not getting throttled yet but I want to understand more about how it happens, what is safe to do to prevent it. I will eventually change the log level to NOTICE when I am finished.

Very slow upload connection speed.

They are logically separated for organization.

rclone cannot exceed hard limits set and enforced by the cloud provider.
if rclone hit a limit, the server would tell rclone.
rclone, on some backends, will self-throttle itself.
the debug log would tell you all you need to know.

Okay this is good to hear and I suspected that in order to implement a particular cloud API, rclone would need to respect the defined limits. However, if rclone doesn't allow a combination of flags to exceed the number of transactions allowed by the cloud provider API then how are other user reporting setting flags and getting throttled or banned?

Also, if it is safe to increase checkers and clearly it increases performance, then why is the default for checkers set to a low value?

I haven't seen any throttling messages in my debug log, what does a message look like so I can search for it?

Okay, as a test I did this:

checkers 8 = 2m3s
checkers 50 = 1m14s
checkers 500 = 50s

So I am clearly getting better performance. I assume that if I set checkers to be some arbitrarily large value like 10,000 I would get diminishing returns and eventually reach some sort of bottleneck. As long as it is safe to do so.

50s, that is a nice improvement.

for me, i like to tweak settings and optimize commands. l learn much that way.

these are some of the messages pcloud might return to rclone.
https://docs.pcloud.com/errors/

which pcloud plan do you use?

You can use a top-up sync strategy to speed up syncs enormously

Let's say you run a top-up sync every 1 hour, then you might do

rclone copy --max-age 1h /path remote:

To only consider files which has changed within the last hour.

Then once a day (say) you run a full rclone sync which will sync deletions and anything missed.

2 Likes

I have been experimenting with this. Because I syncing so frequently, most of the the sync command spends is on checking the existing files. Usually there is only 1 or 2 files to actually transfer.

Setting the checkers to a very high value seems to have made the biggest difference. Can you confirm the information that asdffdsa gave above from a code point of view? Specifically is it safe to set checkers to a very high value? Is there any risk of rate limiting or throttling in this case?

hi,
what i meant was to try the command and see what happens.
each cloud provider is different.

rate limiting is common and rclone does a good job is changing its behavior on-the-fly.

I have been experimenting a lot however I have never seen a rate limiting message so far. I appreciate your advice, it has been very helpful.

--checkers is used in rclone as a general measure of concurrency. In this case setting it higher is doing multiple directory traversals at once which is probably helping.

--checkers can do networks calls - for example if you are listing pCloud - it will be how many directories are listed at once.

However for the top-up sync if there are only a few files to transfer each time then it won't cause you to be rate limited as you'll only be doing a few API requests each time.

There are only a few times to transfer each time however, everytime the sync command runs it checks thousands of files to make the determination that it only needs to transfer a few files.

So every time any change is made in the filesystem, the sync will check all files in the hierarchy and then only transfer the one or two that have changed. This approach seems inefficient however setting --checkers to a very high value really does seem to help, it takes less than a minute to check those thousands of files. As long as this is a safe operation.

What defines an API call, is each file checked an API call in this case or is the entire operation an API call?

The top up sync strategy should fix this hopefully.

It depends on the backend. Most of the info rclone uses comes in directory listings which supply lots of info for lots of objects.

Some backends (like s3) need additional API calls for things (eg for S3 to read the modification time).