Rclone huge performance loss upgrading from 1.55 to 1.62

I'm upgrading from a modded rclone to 1.62 to see if some issues with weird 500 errors with cloudflare r2 would be resolved.

They seem to be resolved but at same time the server is unusable. I see only writes to the disk and almost no reads at all.

Even trying to copy from the mount to /dev/null with a locally cached file hangs

my settings:

ExecStart=/usr/bin/rclone mount
cf: /mnt/remote

rclone v1.62.2
- os/version: ubuntu 18.04 (64 bit)
- os/kernel: 5.4.0-148-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.2
- go/linking: static
- go/tags: none

my old version

rclone v1.55.0-DEV
- os/arch: linux/amd64
- go version: go1.13.8

to upgrade I only installed from website, got the error about fuser3 fixed with a symlink, and only added the --vfs-fast-fingerprint which was the mod I think my in-house team dev did at the time. But obviously something as well is missing. And the performance impact is huge.

Unfortunately the dev is not working with us anymore so I don't exaclty what was done, but i'm pretty sure it was something similar to --vfs-fast--fingerprint

that disables chunked reading, is that the behavior you want?
and as a result, i would think, --vfs-read-ahead=256M, does nothing.

changing that flag, might force rclone to re-download the file, is that the behavior you want?

so it could be the combo of disabling chunked reading and forcing rclone to re-download the entire file, is the cause of the performance loss.

if you switch back to v1.55.0-DEV, does rclone behave?

copy a single file and post the full rclone debug log.

that flag is twice used in the same command.

Is the cache empty and the server has lots of users? If so it should settle down in a bit hopefully.

If not then you'll need to try to make a test case for me that runs fast with v1.55 and slow with 1.62 and I can investigate.

Yes the cache is empty. and at worst case scenario I would expect 3.000 unique files open at once.

I just checked with my DC and they said they rate limited my server because they thought it was a DDOS attack!

They said Cloudflare was sending 19 gbps of traffic to the server - more than my NIC which is 10 gbps...

So I guess I'll have to slow down.... Still not sure how this would affect a copy from a local file to /dev/null tho.

I'm downloading to a zfs pool of nvme disks not used within OS. Usually between 2 to 6 nvme disks, so with a 10 gpbs NIC I'm trying to download files as quickly as possible to disk so all future reads are from cache but I don't want to do that in a blocking way.

The cache is already empty

The file was already on cache I'm sure

That is quite a lot!

It probably got caught up with all the other traffic. Rclone sounds like it was quite busy if it was handling 10 Gbps of traffic.

Note that --vfs-read-chunk-size=0 is probably a contributing factor here as it will encourage the provider to send you more data than you actually need.

I'm going to try again today and I just noticed one thing, I have read only a portion of the file and checked the cache and it didn't download the full file only 67mb.

  --vfs-cache-max-age=8766h \
  --vfs-cache-max-size=14.1T \
  --vfs-cache-mode=full \
  --cache-dir=/mnt/cache \
  --vfs-read-chunk-size=0 \
  --buffer-size=0 \
  --vfs-read-ahead=256M \

With those settings shouldn't the entire file have been downloaded even if I read just 67 MB? or at least 256MB ?

I think this is part of the issue. I was expecting rclone to download the entire files at the first read, regardless of how much has been read, and then move on. If this is not happening then yes a huge amount of open files with empty cache will cause issues

Now I have 2 servers one with v1.55 and v1.62 and I will see if there is performance differences again.

This also explain the other issue in the other topic about high writes/bandwidth use even after no new files are open... they'd only be fully downloaded if read to the end

So far no difference in performance between 1.55 and 1.62 so I believe the issue yesterday is probably related to the rclone settings that is not having the expected behavior. Too many open files + empty cache + not saving data to disk so lots of wasted bandwidth

But error 500 messages I get with 1.55 and r2 doesn't appear in 1.62

It sounded like you were experiencing the dogpile effect.

The cache was warming up, but while the cache warms it is very busy.

That is a result :slight_smile:

How can I make rclone behave like I want ? I want the full file to be downloaded when requested, regardless of how much is requested or if the client aborts reading before the full download

any way ? this keeps generating some issues for me that could be avoided with more aggressive disk writing

I could do this with a patch to the VFS which when you opened a file which wasn't fully cached would start a process off to open the file and read it onto the disk completely regardless of whether the client had the file open or not.

That would be reasonably straight forward and would mean that the client didn't have to wait for the file to be downlaoded in its entirity. Rclone could also do the equivalent of --multi-thread-streams to download multiple parts of the file at once.

I think this would be good. You saw in that other thread that there is a lot of use cases where we need more aggressive disk writing.

I still think that all the vfs read flags are somewhat confusing, and difficult to understand how they interact with each other.

On sunday, I had all my servers running with 1.62 to randomly reboot because the php threads was getting stuck waiting for i/o for over 120 seconds.

I was unable to debug further because when I enabled debug logs the issue went away. This never happened before in ages with 1.55 so I don't know if this was because of my remote ( cloudflare r2 ) or the new rclone version.

The servers were getting the php threads stuck even with minimal writes/downloads/cpu usage.

Should I worry about 1.62 being meant for fuser3 which is not available on ubuntu 18.04 ?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.