Mount and vfs logging

machineworker · July 3, 2021, 9:49pm

Hello

rclone v1.55.1
- os/type: linux
- os/arch: amd64
- go/version: go1.16.3
- go/linking: static
- go/tags: none

With the following command for a google drive mount:

rclone mount data: /mnt/data --attr-timeout 168h --dir-cache-time 168h --poll-interval 3h --use-mmap --read-only --fast-list --vfs-cache-mode full --vfs-cache-max-size 1650G --vfs-cache-max-age 168h --vfs-read-chunk-size 8M --vfs-read-chunk-size-limit 8M --no-check-dest --no-checksum --no-traverse --log-file /var/log/rclone.log --log-level DEBUG --allow-other --umask 000

My objective is to read from cache whenever possible and remove any remote calls where possible. My use case doesn't need to be concerned about data being modified at the remote. However even with the above options, rclone still makes remote calls when reading a file or even reading the same chunk. For example after running the following command 10 times:

head -c 50 test.zip

Shows this in the log for the 10th execution:

2021/07/03 21:35:03 DEBUG : test.zip: Open: flags=OpenReadOnly
2021/07/03 21:35:03 DEBUG : test.zip: Open: flags=O_RDONLY
2021/07/03 21:35:03 DEBUG : test.zip: newRWFileHandle:
2021/07/03 21:35:03 DEBUG : test.zip: >newRWFileHandle: err=<nil>
2021/07/03 21:35:03 DEBUG : test.zip: >Open: fd=test.zip (rw), err=<nil>                                                                                                                                    
2021/07/03 21:35:03 DEBUG : test.zip: >Open: fh=&{test.zip (rw)}, err=<nil>                                                                                                                                 
2021/07/03 21:35:03 DEBUG : &{test.zip (rw)}: Read: len=16384, offset=0
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): _readAt: size=16384, off=0
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): openPending:
2021/07/03 21:35:03 DEBUG : test.zip: vfs cache: checking remote fingerprint "1365278274,2021-07-03 20:35:06.803 +0000 UTC,1641521f487bf2f3e91a1fc4b1e9849f" against cached fingerprint "1365278274,2021-07-03 20:35:06.803 +0000 UTC,1641521f487bf2f3e91a1fc4b1e9849f"                        
2021/07/03 21:35:03 DEBUG : test.zip: vfs cache: truncate to size=1365278274
2021/07/03 21:35:03 DEBUG : : Added virtual directory entry vAddFile: "test.zip"
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): >openPending: err=<nil>
2021/07/03 21:35:03 DEBUG : vfs cache: looking for range={Pos:0 Size:16384} in [{Pos:0 Size:577536}] - present true
2021/07/03 21:35:03 DEBUG : test.zip: ChunkedReader.RangeSeek from -1 to 577536 length -1
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): >_readAt: n=16384, err=<nil>
2021/07/03 21:35:03 DEBUG : &{test.zip (rw)}: >Read: read=16384, err=<nil>
2021/07/03 21:35:03 DEBUG : test.zip: ChunkedReader.Read at -1 length 4096 chunkOffset 577536 chunkSize 8388608                                                                                                                                                                                    
2021/07/03 21:35:03 DEBUG : test.zip: ChunkedReader.openRange at 577536 length 8388608
2021/07/03 21:35:03 DEBUG : &{test.zip (rw)}: Flush:
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): RWFileHandle.Flush
2021/07/03 21:35:03 DEBUG : &{test.zip (rw)}: >Flush: err=<nil>
2021/07/03 21:35:03 DEBUG : &{test.zip (rw)}: Release:
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): RWFileHandle.Release
2021/07/03 21:35:03 DEBUG : test.zip(0xc00bde8940): close:
2021/07/03 21:35:04 DEBUG : test.zip: vfs cache: setting modification time to 2021-07-03 20:35:06.803 +0000 UTC                                                                                                                                                                                    
2021/07/03 21:35:04 DEBUG : test.zip(0xc00bde8940): >close: err=<nil>
2021/07/03 21:35:04 DEBUG : &{test.zip (rw)}: >Release: err=<nil>

Each time there was network activity with rclone.

Can someone tell me if there's something I'm not doing right or is this rclone by design?

What part of the log entry, if any, explains what request rclone is performing or when it is doing a remote call? With the cache backend it's easy for me to spot (lack of technical skills) as it mentions warm/cold lookup.

asdffdsa · July 3, 2021, 10:02pm

hello,

we have experts at reading the debug log of a rclone mount.

they might ask for the following, so to save time, can you post:

backend provider
config file, redacting id/secrets
top lines of the debug log

and not that it matters to your question but
--fast-list does nothing on a mount...

ncw · July 5, 2021, 1:29pm

If you add --dump headers to the command line you'll get the HTTP request and response headers. It should be clear from that what rclone is calling.

Looking at the above log, I'm not sure!

Animosity022 · July 5, 2021, 2:47pm

Since you missed the juicy part of the help and support template, we don't know what backend you are using as you mention the cache backend and seem to have parameters for that, but have on vfs-cache-mode full.

Generally, if something is in cache, you won't see any calls as it all comes from cache.


2021/07/03 21:35:03 DEBUG : vfs cache: looking for range={Pos:0 Size:16384} in [{Pos:0 Size:577536}] - present true

That line is telling you it's from a local file rather than getting it from the remote.

What backend are you using?
What's your rclone.conf look like?
Can you share the whole debug log?

machineworker · July 8, 2021, 12:46pm

Apologies for the delay, appreciate everyone taking time out to help.

Thank you. Dump headers shows a lot of entries even when there are no other "non-dump" related entries most of it appeared to be related to paged data from Google Drive.

I have a number of questions related to vfs and unsure whether it's better to create a new topic or maintain this one. I'm happy to pay for consult too if you're available/interested.

The most pressing issue is to reduce the response times from cache when there are many read operations of small lengths that can be served from the cache. Hoping I can change something on my end to accomodate this.

With the deprecated cache backend, I had to make some changes to the source to reduce requests that weren't required for my use-case. For long cache periods, pollinterval wasn't being honoured and any time there was a change on the remote receiveChangeNotify() would purge directory listings which added painfully long directory listing times when you have cache > chunker > 1G > hundreds of thousands of files so I disabled this function and did a periodic unmount and remount to get fresh data.

When migrating to vfs, performance significantly improved but as the data grew, so did the latency and lookup times. The source data are very large CSV used for AI in medical research. The software performs performs dozens of random reads of small sizes, often ~1KB-1M.

A couple of things I have noticed which may not be good for performance:

vfs cache sets modification time every time a file is opened even if all data is read from the cache.
Every time a file handler is released, remote fingerprints are checked. I understand the use-cases for this but should this not honour vfs-cache-max-age, poll-interval or vfs-cache-poll-interval? The cost seems to be < 200 ms so I wouldn't consider this taxing but perhaps something to look at in the future.

With the performance degradation when there are lots of files being read or lots of random reads, I'm using the following params:

    --no-check-dest --no-traverse --no-check-certificate \
    --fs-cache-expire-duration 180m --fs-cache-expire-interval 180m \
    --read-only --no-checksum --no-modtime --allow-other --umask 000 \
    --attr-timeout 168h \
    --dir-cache-time 168h \
    --poll-interval 12h \
    --vfs-cache-max-age 168h \
    --buffer-size 128M \
    --vfs-read-chunk-size 2M \
    --vfs-read-chunk-size-limit 2M \
    --vfs-cache-mode full \
    --vfs-cache-max-size 700G \
    --vfs-cache-poll-interval 10m

If you can please consider the following log entries:

2021/07/08 12:12:41 DEBUG : 255708940395059385.csv: ChunkedReader.Read at 13762560 length 8192 chunkOffset 13758464 chunkSize 2097152
2021/07/08 12:12:41 DEBUG : 255708940395059385.csv: ChunkedReader.Read at 13770752 length 8192 chunkOffset 13758464 chunkSize 2097152
2021/07/08 12:12:41 DEBUG : 255708940395059385.csv: ChunkedReader.Read at 13787136 length 8192 chunkOffset 13758464 chunkSize 2097152

When the first read is performed, is 2097152 bytes retrieved?
Since the second and third reads could be covered by the same chunk size, is this retrieved from the first read in memory or is it retrieved from disk again?

I'm able to get 2000-3000 iops when performing lots of random reads to multiple files (logs don't show any downloading of chunks either) which should be more than sufficient yet once the software starts running the server load jumps to 20+, sometimes 100+ and response times can take several minutes. Of course I would not suggest tailoring rclone for my use-case and don't consider there to be an issue with rclone as such since my use-case is probably abnormal but I'm hoping there is something I can do on my end to improve this.

Initially the data was on s3 but post pandemic with increasing storage costs and pressure to reduce costs, it was moved to Google Drive. Rclone and chunkers became very cost effective to transparently split files versus development costs to modify the software to support this. It let other technicians view, work and update different parts of the model effectively without running into limitations such as bandwidth. API quota from Google is 1 million per 100 secs per user so rate limiting is not really a concern.

I'm only looking to improve the performance when reading from cache and would greatly appreciate any insight anyone can offer.

system · September 7, 2021, 8:47am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.