Optimizing mount for custom application: reducing read() times

What is the problem you are having with rclone?

Low performance accessing rclone mount with a custom database-like application; the mount has a single directory with a few dozens of files, each file some scores of gigabytes, and each access by the application (as seen with strace) consists of read-only opening one of the files, then a series of (1 x lseek plus 1, 2 or 3 x 8KB reads, then a close:

strace -tt -T -e openat,read,write,lseek,close -f -p $APP_PID

16851 12:51:04.232424 openat(AT_FDCWD, "/REDACTED/REDACTED.dat", O_RDONLY) = 14 <0.001610>
16851 12:51:04.234406 lseek(14, 107510362295, SEEK_SET) = 107510362295 <0.000037>
16851 12:51:04.234616 read(14, "REDACTED"..., 8191) = 8191 <0.003660>
16851 12:51:04.238828 read(14, "REDACTED"..., 8191) = 8191 <0.001021>
16851 12:51:04.240991 read(14, "REDACTED"..., 8191) = 8191 <0.000142>
16851 12:51:04.241335 lseek(14, 108183601375, SEEK_SET) = 108183601375 <0.000041>
16851 12:51:04.241466 read(14, "REDACTED"..., 8191) = 8191 <1.556332>
16851 12:51:05.798263 lseek(14, 99049494583, SEEK_SET) = 99049494583 <0.000016>
16851 12:51:05.798454 read(14, "REDACTED"..., 8448) = 8448 <1.513859>
16851 12:51:07.312511 lseek(14, 81231850168, SEEK_SET) = 81231850168 <0.000015>
16851 12:51:07.312701 read(14, "REDACTED"..., 8191) = 8191 <1.534364>
16851 12:51:08.847264 read(14, "REDACTED"..., 8191) = 8191 <0.000734>
16851 12:51:08.865650 lseek(14, 63848342743, SEEK_SET) = 63848342743 <0.000029>
16851 12:51:08.866082 read(14, "REDACTED"..., 8191) = 8191 <1.521446>
16851 12:51:10.387820 read(14, "REDACTED"..., 8191) = 8191 <0.002077>
16851 12:51:10.390326 lseek(14, 45966850468, SEEK_SET) = 45966850468 <0.000063>
16851 12:51:10.390527 read(14, "REDACTED"..., 8191) = 8191 <1.396265>
16851 12:51:11.787305 read(14, "REDACTED"..., 8191) = 8191 <0.000508>
16851 12:51:11.788221 lseek(14, 31801455043, SEEK_SET) = 31801455043 <0.000032>
16851 12:51:11.788379 read(14, "REDACTED"..., 8191) = 8191 <1.803120>
16851 12:51:13.591752 read(14, "REDACTED"..., 8191) = 8191 <0.002589>
16851 12:51:13.594805 lseek(14, 21270188518, SEEK_SET) = 21270188518 <0.000018>
16851 12:51:13.594932 read(14, "REDACTED"..., 8191) = 8191 <1.420962>
16851 12:51:15.016178 read(14, "REDACTED"..., 8191) = 8191 <0.000354>
16851 12:51:15.016813 lseek(14, 6584599198, SEEK_SET) = 6584599198 <0.000021>
16851 12:51:15.016941 read(14,  <unfinished ...>
16851 12:51:16.428064 <... read resumed> "REDACTED"..., 8191) = 8191 <1.411070>
16851 12:51:16.428286 read(14, "REDACTED"..., 8191) = 8191 <0.000520>
16851 12:51:16.429140 close(14)         = 0 <0.000317>

The problem (as you can see by the syscall times between angle brackets at the end of each line) is that the first read after the lseek takes around 1.5 seconds; I need to optimize the mount so it takes as little as possible, ideally less than 0.2s.

The machine running rclone mount and the application is not in any way overloaded, has a very fast/low latency 100Mbps connection to the Internet, which results in under 5ms ping time to the Google Drive servers rclone mount has connected to (which I have identified by using netstat -nap | grep rclone | grep ESTABLISHED).

Thanks in advance to everyone!

Cheers,
-- Durval.

What is your rclone version (output from rclone version)

rclone v1.55.1

  • os/type: linux
  • os/arch: amd64
  • go/version: go1.16.3
  • go/linking: static
  • go/tags: none

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux Devuan Beowulf (systemd-less Debian 10 Buster equivalent)

Which cloud storage system are you using? (eg Google Drive)

Encrypted Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --rc --rc-addr=127.0.0.1:55720 --dir-cache-time=100000h \
--vfs-cache-max-age=100000h --poll-interval=15s \
--cache-dir=/REDACTED/rclone_mount_cache --vfs-cache-mode=full \
--vfs-cache-max-size=50G --buffer-size=8k --vfs-read-ahead=8k \
--vfs-read-chunk-size=16k --vfs-read-chunk-size-limit=off --read-only \
--user-agent RANDOM_STRING mount REDACTED:REDACTED /REDACTED

The rclone config contents with secrets removed.

[REDACTED]
type = crypt
remote = REDACTED2:REDACTED/REDACTED
filename_encryption = standard
directory_name_encryption = true
password = REDACTED

[REDACTED2]
type = drive
client_id = REDACTED.apps.googleusercontent.com
client_secret = REDACTED
service_account_file = 
token = {"access_token":"REDACTED","token_type":"Bearer","refresh_token":"REDACTED","expiry":"2021-05-05T16:53:45.204740176Z"}
root_folder_id = REDACTED

A log from the command with the -vv flag

-v -v output is really monstrous, and as it's not really a bug but rather a performance problem, I think the strace above should suffice -- if not, please tell me what you need, preferably in a way I can set into a running rclone via rclone rc.

I'm not sure how you'd overcome the latency on a cloud remote.

There is a fixed amount of time for a packet to come from A->B and B->A to make the transaction so I can't imagine you can get that low in a call based over the internet.

Once the file is cached to disk, it becomes very fast, but the slow part is getting the first bits on disk.

As reported, I have <5ms ping times (which means packet round-trip) from the machine to the Google servers which rclone is connected to. As each read() is taking around 1.5s (1500ms), network latency can't be more than 5/1500= 1/300th of the problem... unless of course a single read takes 300 packets of back-and-forth with the servers, which would be even more absurd.

Any other ideas?

I think that's something more specific to your setup.

I can ffprobe/mediainfo a non cached file in 2-4 seconds majority of the time which does 3-4 open/closes.

Ping is light as it doesn't establish a connection as you get more overhead on a TCP connection. I use some network tuning to help things out on Ubuntu.

In my /etc/sysctl.conf

# Network Tuning
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

I can't really isolate easily on my system as it has use but you can run a tcpdump and see how many packets it takes.

I'd imagine you get more than a few packets, but not quite 300.

I tend to read sequentially for streaming so I tune a bit more towards that than anything else. What does the rclone debug look like when you are seeking around? I'd be curious to see how those parameters you have work along with the strace.

Thanks for the feedback, @Animosity022. BTW, I'm very glad you are responding to my questions, as I'm aware you are one of the foremost plexhead^H^H^H^H^H^H^H^Hexperts on rclone mount around -- in fact, a lot of the options I'm using on my rclone mount to try and get better performance was gleaned from your excellent posts.

I think I have progressed further, by using another, simultaneous strace to record network calls (-e network) from the running rclone process, and comparing it side-by-side with the strace for the file I/O calls from the application:

application.strace:

16876 17:31:38.520955 openat(AT_FDCWD, "REDACTED", O_RDONLY) = 14 <0.000870>
16876 17:31:38.522055 lseek(14, 107549374626, SEEK_SET) = 107549374626 <0.000079>
16876 17:31:38.522257 read(14, "REDACTED"..., 8191) = 8191 <1.678795>
16876 17:31:40.201214 lseek(14, 108510460854, SEEK_SET) = 108510460854 <0.000049>

rclone.strace:

16048 17:31:31.096317 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=16048, si_uid=1000} ---
17124 17:31:40.203236 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
16561 17:31:40.203497 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 11 <0.000046>
16561 17:31:40.203668 setsockopt(11, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000023>
16561 17:31:40.203889 connect(11, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 <0.000055>

As you can see from the timestamps, at 17:31:38 the application did the openat() to start accessing the archive and immediatelly an lseek() and then read(), rclone did absolutely nothing network-related until 17:31:40... which is when the read() call returns -- which is why the read call takes almost 2 seconds.

So whatever is slowing down rclone's processing of the application's read(), seems not to be network related. Any other ideas?

@ncw, would you care to please take a look?

Do you have the rclone debug log to go along with that? I think that may shed some light on the missing time.

Here it is, another capture (now with the correspondent rclone -v -v output):

application.strace:

16876 18:03:34.132377 openat(AT_FDCWD, "REDACTED", O_RDONLY) = 14 <0.002966>
16876 18:03:34.135783 lseek(14, 107561714201, SEEK_SET) = 107561714201 <0.000039>
16876 18:03:34.136011 read(14, "\27\325\243\302\27\325\312\207\27\325\361)\27\326\30J\27\326?d\27\326f\304\27\326\216u\27\326\266\25"..., 8191) = 8191 <0.006891>
16876 18:03:34.143107 lseek(14, 107686848885, SEEK_SET) = 107686848885 <0.000011>
16876 18:03:34.143221 read(14, "\t*\261\321\21e\210\274\265X\"+\350K\224F\371\374'\325]]\224\0024\5\20\360\272\214yc"..., 8191) = 8191 <1.998166>
16876 18:03:36.141887 lseek(14, 91538688793, SEEK_SET) = 91538688793 <0.000021>

(notice that the first read completed very quickly, so we are concentrating on the second)'

rclone.strace:

16048 18:03:31.096395 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=16048, si_uid=1000} ---
16052 18:03:34.089651 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=16048, si_uid=1000} ---
16048 18:03:36.200776 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 11 <0.000057>
16048 18:03:36.201065 setsockopt(11, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000050>
16048 18:03:36.201341 connect(11, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 <0.000139>

rclone -vv (aka DEBUG) output:

2021/05/05 18:03:34 DEBUG : REDACTED(0xc001ac6680): openPending:
2021/05/05 18:03:34 DEBUG : REDACTED: vfs cache: checking remote fingerprint "108852150885,2021-04-24 17:09:40.712 +0000 UTC" against cached fingerprint "108852150885,2021-04-24 17:09:40.712 +0000 UTC"
2021/05/05 18:03:34 DEBUG : REDACTED: vfs cache: truncate to size=108852150885
2021/05/05 18:03:34 DEBUG : : Added virtual directory entry vAddFile: "REDACTED"
2021/05/05 18:03:34 DEBUG : REDACTED(0xc001ac6680): >openPending: err=<nil>
2021/05/05 18:03:34 DEBUG : vfs cache: looking for range={Pos:107561713664 Size:12288} in [{Pos:0 Size:7417856} {Pos:24518656 Size:65536} {Pos:904704000 Size:65536} {Pos:1101082624 Size:65536} {Pos:1504014336 Size:65536} {Pos:3118067712 Size:65536} {Pos:4213551104 Size:33550336} {Pos:4939382784 Size:98304} {Pos:5676269568 Size:98304} {Pos:8705957888 Size:98304} {Pos:9928531968 Size:65536} {Pos:9936957440 Size:67104768} {Pos:10622353408 Size:98304} {Pos:14891769856 Size:98304} {Pos:15676645376 Size:98304} {Pos:15967277056 Size:18870272} {Pos:16657260544 Size:98304} {Pos:19652472832 Size:98304} {Pos:19975049216 Size:98304} {Pos:21099098112 Size:98304} {Pos:21109723136 Size:75493376} {Pos:22004527104 Size:98304} {Pos:22491205632 Size:98304} {Pos:24286523392 Size:98304} {Pos:25244733440 Size:98304} {Pos:28851007488 Size:98304} {Pos:30492295168 Size:98304} {Pos:31519326208 Size:98304} {Pos:31646384128 Size:98304} {Pos:32522166272 Size:98304} {Pos:33414164480 Size:98304} {Pos:33733869568 Size:10481664} {Pos:33875701760 Size:98304} {Pos:35253862400 Size:98304} {Pos:35330072576 Size:34598912} {Pos:36624683008 Size:98304} {Pos:39953850368 Size:98304} {Pos:44438446080 Size:98304} {Pos:44658184192 Size:40890368} {Pos:45237469184 Size:98304} {Pos:45890457600 Size:98304} {Pos:45905268736 Size:98304} {Pos:45976604672 Size:69201920} {Pos:46592745472 Size:98304} {Pos:47316680704 Size:98304} {Pos:51746037760 Size:98304} {Pos:51876347904 Size:98304} {Pos:52863660032 Size:98304} {Pos:53622661120 Size:98304} {Pos:57672015872 Size:98304} {Pos:57925410816 Size:98304} {Pos:58419531776 Size:98304} {Pos:58434535424 Size:29356032} {Pos:60098179072 Size:54521856} {Pos:62542782464 Size:98304} {Pos:63156027392 Size:98304} {Pos:64522674176 Size:98304} {Pos:65670201344 Size:98304} {Pos:66305572864 Size:98304} {Pos:67394764800 Size:98304} {Pos:67608412160 Size:98304} {Pos:73810776064 Size:24113152} {Pos:74866327552 Size:98304} {Pos:75852640256 Size:46133248} {Pos:80942411776 Size:98304} {Pos:81039863808 Size:98304} {Pos:82727702528 Size:98304} {Pos:85495287808 Size:98304} {Pos:86118907904 Size:98304} {Pos:86381875200 Size:98304} {Pos:88699338752 Size:98304} {Pos:89039265792 Size:98304} {Pos:89546375168 Size:98304} {Pos:90246795264 Size:65536} {Pos:93649420288 Size:65536} {Pos:94749569024 Size:14675968} {Pos:99390849024 Size:65536} {Pos:104173506560 Size:65536} {Pos:106601869312 Size:65536} {Pos:106636378112 Size:65536} {Pos:107561553920 Size:196608} {Pos:107561910272 Size:7487488} {Pos:107592888320 Size:65536} {Pos:107840352256 Size:65536} {Pos:107920363520 Size:5238784} {Pos:108257910784 Size:65536} {Pos:108605739008 Size:65536} {Pos:108782350336 Size:65536} {Pos:108784857088 Size:65536}] - present true
2021/05/05 18:03:34 DEBUG : REDACTED(0xc001ac6680): >_readAt: n=12288, err=<nil>
2021/05/05 18:03:34 DEBUG : &{REDACTED (rw)}: >Read: read=12288, err=<nil>
2021/05/05 18:03:34 DEBUG : &{REDACTED (rw)}: Read: len=12288, offset=107686846464
2021/05/05 18:03:34 DEBUG : REDACTED(0xc001ac6680): _readAt: size=12288, off=107686846464
2021/05/05 18:03:34 DEBUG : vfs cache: looking for range={Pos:107686846464 Size:12288} in [{Pos:0 Size:7417856} {Pos:24518656 Size:65536} {Pos:904704000 Size:65536} {Pos:1101082624 Size:65536} {Pos:1504014336 Size:65536} {Pos:3118067712 Size:65536} {Pos:4213551104 Size:33550336} {Pos:4939382784 Size:98304} {Pos:5676269568 Size:98304} {Pos:8705957888 Size:98304} {Pos:9928531968 Size:65536} {Pos:9936957440 Size:67104768} {Pos:10622353408 Size:98304} {Pos:14891769856 Size:98304} {Pos:15676645376 Size:98304} {Pos:15967277056 Size:18870272} {Pos:16657260544 Size:98304} {Pos:19652472832 Size:98304} {Pos:19975049216 Size:98304} {Pos:21099098112 Size:98304} {Pos:21109723136 Size:75493376} {Pos:22004527104 Size:98304} {Pos:22491205632 Size:98304} {Pos:24286523392 Size:98304} {Pos:25244733440 Size:98304} {Pos:28851007488 Size:98304} {Pos:30492295168 Size:98304} {Pos:31519326208 Size:98304} {Pos:31646384128 Size:98304} {Pos:32522166272 Size:98304} {Pos:33414164480 Size:98304} {Pos:33733869568 Size:10481664} {Pos:33875701760 Size:98304} {Pos:35253862400 Size:98304} {Pos:35330072576 Size:34598912} {Pos:36624683008 Size:98304} {Pos:39953850368 Size:98304} {Pos:44438446080 Size:98304} {Pos:44658184192 Size:40890368} {Pos:45237469184 Size:98304} {Pos:45890457600 Size:98304} {Pos:45905268736 Size:98304} {Pos:45976604672 Size:69201920} {Pos:46592745472 Size:98304} {Pos:47316680704 Size:98304} {Pos:51746037760 Size:98304} {Pos:51876347904 Size:98304} {Pos:52863660032 Size:98304} {Pos:53622661120 Size:98304} {Pos:57672015872 Size:98304} {Pos:57925410816 Size:98304} {Pos:58419531776 Size:98304} {Pos:58434535424 Size:29356032} {Pos:60098179072 Size:54521856} {Pos:62542782464 Size:98304} {Pos:63156027392 Size:98304} {Pos:64522674176 Size:98304} {Pos:65670201344 Size:98304} {Pos:66305572864 Size:98304} {Pos:67394764800 Size:98304} {Pos:67608412160 Size:98304} {Pos:73810776064 Size:24113152} {Pos:74866327552 Size:98304} {Pos:75852640256 Size:46133248} {Pos:80942411776 Size:98304} {Pos:81039863808 Size:98304} {Pos:82727702528 Size:98304} {Pos:85495287808 Size:98304} {Pos:86118907904 Size:98304} {Pos:86381875200 Size:98304} {Pos:88699338752 Size:98304} {Pos:89039265792 Size:98304} {Pos:89546375168 Size:98304} {Pos:90246795264 Size:65536} {Pos:93649420288 Size:65536} {Pos:94749569024 Size:14675968} {Pos:99390849024 Size:65536} {Pos:104173506560 Size:65536} {Pos:106601869312 Size:65536} {Pos:106636378112 Size:65536} {Pos:107561553920 Size:196608} {Pos:107561910272 Size:7487488} {Pos:107592888320 Size:65536} {Pos:107840352256 Size:65536} {Pos:107920363520 Size:5238784} {Pos:108257910784 Size:65536} {Pos:108605739008 Size:65536} {Pos:108782350336 Size:65536} {Pos:108784857088 Size:65536}] - present false
2021/05/05 18:03:34 DEBUG : REDACTED: ChunkedReader.RangeSeek from -1 to 107686846464 length -1
2021/05/05 18:03:34 DEBUG : REDACTED: ChunkedReader.Read at -1 length 32768 chunkOffset 107686846464 chunkSize 16384
2021/05/05 18:03:34 DEBUG : REDACTED: ChunkedReader.openRange at 107686846464 length 16384
2021/05/05 18:03:35 DEBUG : REDACTED: ChunkedReader.Read at 107686862848 length 16384 chunkOffset 107686846464 chunkSize 16384
2021/05/05 18:03:35 DEBUG : REDACTED: ChunkedReader.openRange at 107686862848 length 32768
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.Read at 107686879232 length 32768 chunkOffset 107686862848 chunkSize 32768
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.Read at 107686895616 length 16384 chunkOffset 107686862848 chunkSize 32768
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.openRange at 107686895616 length 65536
2021/05/05 18:03:36 DEBUG : REDACTED(0xc001ac6680): >_readAt: n=12288, err=<nil>
2021/05/05 18:03:36 DEBUG : &{REDACTED (rw)}: >Read: read=12288, err=<nil>
2021/05/05 18:03:36 DEBUG : &{REDACTED (rw)}: Read: len=12288, offset=91538685952
2021/05/05 18:03:36 DEBUG : REDACTED(0xc001ac6680): _readAt: size=12288, off=91538685952
2021/05/05 18:03:36 DEBUG : vfs cache: looking for range={Pos:91538685952 Size:12288} in [{Pos:0 Size:7417856} {Pos:24518656 Size:65536} {Pos:904704000 Size:65536} {Pos:1101082624 Size:65536} {Pos:1504014336 Size:65536} {Pos:3118067712 Size:65536} {Pos:4213551104 Size:33550336} {Pos:4939382784 Size:98304} {Pos:5676269568 Size:98304} {Pos:8705957888 Size:98304} {Pos:9928531968 Size:65536} {Pos:9936957440 Size:67104768} {Pos:10622353408 Size:98304} {Pos:14891769856 Size:98304} {Pos:15676645376 Size:98304} {Pos:15967277056 Size:18870272} {Pos:16657260544 Size:98304} {Pos:19652472832 Size:98304} {Pos:19975049216 Size:98304} {Pos:21099098112 Size:98304} {Pos:21109723136 Size:75493376} {Pos:22004527104 Size:98304} {Pos:22491205632 Size:98304} {Pos:24286523392 Size:98304} {Pos:25244733440 Size:98304} {Pos:28851007488 Size:98304} {Pos:30492295168 Size:98304} {Pos:31519326208 Size:98304} {Pos:31646384128 Size:98304} {Pos:32522166272 Size:98304} {Pos:33414164480 Size:98304} {Pos:33733869568 Size:10481664} {Pos:33875701760 Size:98304} {Pos:35253862400 Size:98304} {Pos:35330072576 Size:34598912} {Pos:36624683008 Size:98304} {Pos:39953850368 Size:98304} {Pos:44438446080 Size:98304} {Pos:44658184192 Size:40890368} {Pos:45237469184 Size:98304} {Pos:45890457600 Size:98304} {Pos:45905268736 Size:98304} {Pos:45976604672 Size:69201920} {Pos:46592745472 Size:98304} {Pos:47316680704 Size:98304} {Pos:51746037760 Size:98304} {Pos:51876347904 Size:98304} {Pos:52863660032 Size:98304} {Pos:53622661120 Size:98304} {Pos:57672015872 Size:98304} {Pos:57925410816 Size:98304} {Pos:58419531776 Size:98304} {Pos:58434535424 Size:29356032} {Pos:60098179072 Size:54521856} {Pos:62542782464 Size:98304} {Pos:63156027392 Size:98304} {Pos:64522674176 Size:98304} {Pos:65670201344 Size:98304} {Pos:66305572864 Size:98304} {Pos:67394764800 Size:98304} {Pos:67608412160 Size:98304} {Pos:73810776064 Size:24113152} {Pos:74866327552 Size:98304} {Pos:75852640256 Size:46133248} {Pos:80942411776 Size:98304} {Pos:81039863808 Size:98304} {Pos:82727702528 Size:98304} {Pos:85495287808 Size:98304} {Pos:86118907904 Size:98304} {Pos:86381875200 Size:98304} {Pos:88699338752 Size:98304} {Pos:89039265792 Size:98304} {Pos:89546375168 Size:98304} {Pos:90246795264 Size:65536} {Pos:93649420288 Size:65536} {Pos:94749569024 Size:14675968} {Pos:99390849024 Size:65536} {Pos:104173506560 Size:65536} {Pos:106601869312 Size:65536} {Pos:106636378112 Size:65536} {Pos:107561553920 Size:196608} {Pos:107561910272 Size:7487488} {Pos:107592888320 Size:65536} {Pos:107686846464 Size:32768} {Pos:107840352256 Size:65536} {Pos:107920363520 Size:5238784} {Pos:108257910784 Size:65536} {Pos:108605739008 Size:65536} {Pos:108782350336 Size:65536} {Pos:108784857088 Size:65536}] - present false
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.RangeSeek from -1 to 91538685952 length -1
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.Read at -1 length 32768 chunkOffset 91538685952 chunkSize 16384
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.openRange at 91538685952 length 16384
2021/05/05 18:03:37 DEBUG : REDACTED: ChunkedReader.Read at 91538702336 length 16384 chunkOffset 91538685952 chunkSize 16384
2021/05/05 18:03:37 DEBUG : REDACTED: ChunkedReader.openRange at 91538702336 length 32768
2021/05/05 18:03:37 DEBUG : REDACTED(0xc001ac6680): >_readAt: n=12288, err=<nil>
2021/05/05 18:03:37 DEBUG : &{REDACTED (rw)}: >Read: read=12288, err=<nil>

So AFAICU from the rclone debug output, it spends the 2 seconds from 18:03:34 to 18:03:36 (which is when the application receives its response to the read call) by internally calling ChunkedReader.RangeSeek/ChunkedReader.openRange/etc, but none of these calls result in any network-related system calls (as attested on rclone.strace).

Really, really strange... What do you think?

You are using --vfs-cache-mode full so the first block was found in the cache - you can see this in the log

2021/05/05 18:03:34 DEBUG : vfs cache: looking for range={Pos:107561713664 Size:12288} [snip] - present true

Note that it is worth using --log-format time,microseconds to get the rclone logs to match the strace ones.

Here is the Read in question. Those ChunkedReader read messages indicate rclone read that data from the stream (I checked the source), so I wonder if all the network activity is captured in the strace?

2021/05/05 18:03:34 DEBUG : &{REDACTED (rw)}: Read: len=12288, offset=107686846464
2021/05/05 18:03:34 DEBUG : REDACTED(0xc001ac6680): _readAt: size=12288, off=107686846464
2021/05/05 18:03:34 DEBUG : vfs cache: looking for range={Pos:107686846464 Size:12288} in [snip] - present false
2021/05/05 18:03:34 DEBUG : REDACTED: ChunkedReader.RangeSeek from -1 to 107686846464 length -1
2021/05/05 18:03:34 DEBUG : REDACTED: ChunkedReader.Read at -1 length 32768 chunkOffset 107686846464 chunkSize 16384
2021/05/05 18:03:34 DEBUG : REDACTED: ChunkedReader.openRange at 107686846464 length 16384
2021/05/05 18:03:35 DEBUG : REDACTED: ChunkedReader.Read at 107686862848 length 16384 chunkOffset 107686846464 chunkSize 16384
2021/05/05 18:03:35 DEBUG : REDACTED: ChunkedReader.openRange at 107686862848 length 32768
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.Read at 107686879232 length 32768 chunkOffset 107686862848 chunkSize 32768
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.Read at 107686895616 length 16384 chunkOffset 107686862848 chunkSize 32768
2021/05/05 18:03:36 DEBUG : REDACTED: ChunkedReader.openRange at 107686895616 length 65536
2021/05/05 18:03:36 DEBUG : REDACTED(0xc001ac6680): >_readAt: n=12288, err=<nil>
2021/05/05 18:03:36 DEBUG : &{REDACTED (rw)}: >Read: read=12288, err=<nil>

There is a little test program which does random seeks on a big file in the rclone source tree which you might find useful.

Here is it run against google drive with --vfs-cache-mode off against a 1 GB file of random numbers. Note the variation in times!

This test with --vfs-cache-mode off is probably the best possible case. Remember a seek needs to open a new connection to drive so is by its nature quite slow.

~/go/src/github.com/rclone/rclone/cmd/mount/test$ go run seek_speed.go -b 16384  /mnt/tmp/1GB-random
2021/05/07 11:21:17 File Open took 531.122µs
2021/05/07 11:21:19 Reading 6671 from 134020434 took 1.072946228s 
2021/05/07 11:21:19 Reading 1211 from 647328285 took 512.038803ms 
2021/05/07 11:21:19 Reading 4486 from 164029137 took 419.523226ms 
2021/05/07 11:21:20 Reading 4780 from 957743134 took 532.713437ms 
2021/05/07 11:21:20 Reading 1956 from 958990240 took 361.617965ms 
2021/05/07 11:21:21 Reading 6831 from 864509188 took 489.811695ms 
2021/05/07 11:21:21 Reading 8177 from 978417974 took 342.904347ms 
2021/05/07 11:21:22 Reading 5402 from 1018408203 took 548.149103ms 
2021/05/07 11:21:22 Reading 7317 from 1011747688 took 491.751944ms 
2021/05/07 11:21:23 Reading 3042 from 627253638 took 436.344301ms 
2021/05/07 11:21:23 Reading 2266 from 379326753 took 519.83068ms 
2021/05/07 11:21:24 Reading 7826 from 915302975 took 413.13361ms 
2021/05/07 11:21:24 Reading 3371 from 752835466 took 470.003845ms 
2021/05/07 11:21:25 Reading 3320 from 97964648 took 463.367199ms 
2021/05/07 11:21:25 Reading 6135 from 913623924 took 510.022457ms 
2021/05/07 11:21:26 Reading 12152 from 999577560 took 656.945695ms 
2021/05/07 11:21:26 Reading 15119 from 960629247 took 397.991838ms 
2021/05/07 11:21:27 Reading 2380 from 438689995 took 432.484287ms 
2021/05/07 11:21:27 Reading 1015 from 919928731 took 358.349028ms 
2021/05/07 11:21:27 Reading 2962 from 449922548 took 403.938402ms 
2021/05/07 11:21:28 Reading 658 from 1064433867 took 459.985378ms 
2021/05/07 11:21:28 Reading 9027 from 1019929124 took 397.735844ms 
2021/05/07 11:21:29 Reading 15251 from 1033801660 took 421.310006ms 
2021/05/07 11:21:29 Reading 15076 from 909794159 took 523.473542ms 
2021/05/07 11:21:29 Reading 9305 from 1015643398 took 369.171872ms 
2021/05/07 11:21:29 That took 12.007394951s for 25 iterations, 480.295798ms per iteration

Here is it run against google drive with --vfs-cache-mode full with everything uncached.

Note that this has a potential problem which might be affecting you... Every time you seek rclone opens a new stream. If streams are unused for 5 seconds then they are closed. This means that by the end of this there are many streams open. I think that might be what the 7s and 5s delay is about - google saying slow down you've got too many streams open or something like that.

~/go/src/github.com/rclone/rclone/cmd/mount/test$ go run seek_speed.go -b 16384  /mnt/tmp/1GB-random
2021/05/07 11:21:50 File Open took 510.753µs
2021/05/07 11:21:50 Reading 6671 from 134020434 took 312.94326ms 
2021/05/07 11:21:51 Reading 1211 from 647328285 took 355.935813ms 
2021/05/07 11:21:51 Reading 4486 from 164029137 took 373.86445ms 
2021/05/07 11:21:52 Reading 4780 from 957743134 took 387.907916ms 
2021/05/07 11:21:53 Reading 1956 from 958990240 took 1.068225744s 
2021/05/07 11:21:53 Reading 6831 from 864509188 took 404.316803ms 
2021/05/07 11:21:53 Reading 8177 from 978417974 took 367.660519ms 
2021/05/07 11:21:54 Reading 5402 from 1018408203 took 730.275205ms 
2021/05/07 11:21:55 Reading 7317 from 1011747688 took 712.884634ms 
2021/05/07 11:21:56 Reading 3042 from 627253638 took 676.135835ms 
2021/05/07 11:22:03 Reading 2266 from 379326753 took 7.855170294s 
2021/05/07 11:22:04 Reading 7826 from 915302975 took 367.511788ms 
2021/05/07 11:22:04 Reading 3371 from 752835466 took 371.001726ms 
2021/05/07 11:22:04 Reading 3320 from 97964648 took 368.493694ms 
2021/05/07 11:22:05 Reading 6135 from 913623924 took 705.870022ms 
2021/05/07 11:22:06 Reading 12152 from 999577560 took 951.813615ms 
2021/05/07 11:22:06 Reading 15119 from 960629247 took 1.667044ms 
2021/05/07 11:22:07 Reading 2380 from 438689995 took 514.84987ms 
2021/05/07 11:22:12 Reading 1015 from 919928731 took 5.445196966s 
2021/05/07 11:22:14 Reading 2962 from 449922548 took 1.738203679s 
2021/05/07 11:22:14 Reading 658 from 1064433867 took 500.132817ms 
2021/05/07 11:22:14 Reading 9027 from 1019929124 took 1.066647ms 
2021/05/07 11:22:19 Reading 15251 from 1033801660 took 4.763282978s 
2021/05/07 11:22:20 Reading 15076 from 909794159 took 459.942412ms 
2021/05/07 11:22:20 Reading 9305 from 1015643398 took 430.81µs 
2021/05/07 11:22:20 That took 29.436450515s for 25 iterations, 1.17745802s per iteration

Here is it run against google drive with --vfs-cache-mode full with everything cached

~/go/src/github.com/rclone/rclone/cmd/mount/test$ go run seek_speed.go -b 16384  /mnt/tmp/1GB-random
2021/05/07 11:22:58 File Open took 550.124µs
2021/05/07 11:22:58 Reading 6671 from 134020434 took 464.662µs 
2021/05/07 11:22:58 Reading 1211 from 647328285 took 205.465µs 
2021/05/07 11:22:58 Reading 4486 from 164029137 took 258.331µs 
2021/05/07 11:22:58 Reading 4780 from 957743134 took 164.935µs 
2021/05/07 11:22:58 Reading 1956 from 958990240 took 113.174µs 
2021/05/07 11:22:58 Reading 6831 from 864509188 took 430.264µs 
2021/05/07 11:22:58 Reading 8177 from 978417974 took 589.515µs 
2021/05/07 11:22:58 Reading 5402 from 1018408203 took 209.942µs 
2021/05/07 11:22:58 Reading 7317 from 1011747688 took 554.789µs 
2021/05/07 11:22:58 Reading 3042 from 627253638 took 412.394µs 
2021/05/07 11:22:58 Reading 2266 from 379326753 took 263.171µs 
2021/05/07 11:22:58 Reading 7826 from 915302975 took 175.128µs 
2021/05/07 11:22:58 Reading 3371 from 752835466 took 258.59µs 
2021/05/07 11:22:58 Reading 3320 from 97964648 took 261.212µs 
2021/05/07 11:22:58 Reading 6135 from 913623924 took 168.574µs 
2021/05/07 11:22:58 Reading 12152 from 999577560 took 187.977µs 
2021/05/07 11:22:58 Reading 15119 from 960629247 took 253.037µs 
2021/05/07 11:22:58 Reading 2380 from 438689995 took 173.297µs 
2021/05/07 11:22:58 Reading 1015 from 919928731 took 282.592µs 
2021/05/07 11:22:58 Reading 2962 from 449922548 took 229.285µs 
2021/05/07 11:22:58 Reading 658 from 1064433867 took 349.208µs 
2021/05/07 11:22:58 Reading 9027 from 1019929124 took 178.528µs 
2021/05/07 11:22:58 Reading 15251 from 1033801660 took 8.128729ms 
2021/05/07 11:22:58 Reading 15076 from 909794159 took 352.094µs 
2021/05/07 11:22:58 Reading 9305 from 1015643398 took 268.348µs 
2021/05/07 11:22:58 That took 15.558586ms for 25 iterations, 622.343µs per iteration
1 Like

Hello @ncw,

Many thanks for the thoughtful, detailed reply!

Some initial comments:

Good point! I ran strace with -e network, so IMHO it should have -- but I just rechecked the complete output and I only see accept4/getsockname, setsockopt, getpeername, connect and similar syscalls, and no send.* nor recv.* syscalls, so perhaps I/O is being done via read/write syscalls (which perhaps are only logged via -e file despite being used with network filedescriptors, or something else.

Sometime later today (I'm far from my computer right now), I will try and run a full strace (ie, without any -e ) and see what happens.

I noticed that, but it seems it can't be set via rclone rc, right? And I really don't want to interrupt the currently running rclone as it's still filling up its cache -- or does the cache remains valid if I stop/restart rclone to include this option?

I was about to write my own test program to make the problem more easily testable/reproducible, thanks for the tip -- will search for that one instead.

You mean the contrary, no? Intuitively, I understand that no cache == worst possible case.

Ouch! I didn't know that! Isn't there a way around this? I did a quick search and found that, which from my limited knowledge of the Google Drive API, seems to indicate that no new connection would be necessary (it would be enough to include the resulting "Range: start-end" in a request within a currently opened connection). Or am I missing something?

Wouldn't that get logged on the rclone -vv output? I just reviewed it and there's nothing like that in there.

Cheers,
-- Durval.

I'm pretty sure Go uses epoll and then read/write on the file descriptors returned.

You can set it, yes, not sure it will have any effect as I think it is only read at startup.

Assuming that the data isn't in the cache I meant.

If rclone has a connection open and free, you are correct. This is likely to be the case. However it is still a full HTTP roundtrip and drive doesn't seem to be very snappy at those!

I meant more like some rate limiting at google slowing down the connections. It might be complete co-incidence - repeating the test a few times would probably be a good idea!

1 Like

Hello @ncw,

Thanks for the great reply. Some more comments:

Your suspicion was correct: I just tested with rclone --rc-addr=REDACTED:REDACTED rc options/set --json '{"main": {"LogFormat": "date,time,microseconds"}}' and got the usual non-error response of {}, but the log still shows no microseconds.

Thanks for all the tips & clarifications. I plan on doing some more testing tomorrow (now better informed thanks to you), will post the results here.

Cheers,
-- Durval.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.