Mount :: HTTP Request bytes=0-31 Frequently

Earthwalker · September 18, 2022, 12:56pm

Hi,

Using rclone v1.59.1 with Dropbox, found there was always one POST request bytes=0-31 happens together with one POST request that downloads the chunk needed.

I thought this will double HTTP requests, questions are:

Why need this kind of request every time request a chunk?
If this means more API requests and more time to process, is there anything we can do to reduce these costs?
I have vfs cache set to full, why rclone request this part again and again instead of storing them in cache?

rclone configs:

/usr/bin/rclone mount cloud: /mnt \
--umask 222 --allow-other --buffer-size 0 \
--dump headers --log-level=DEBUG \
--vfs-read-chunk-size 1M --vfs-read-chunk-size-limit 64M \
--vfs-cache-mode full

Please find corresponding logs below:

2022/09/18 20:17:58 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:17:58 DEBUG : HTTP REQUEST (req 0xc000000001)
2022/09/18 20:17:58 DEBUG : POST /2/files/download HTTP/1.1
Host: content.dropboxapi.com
User-Agent: Archive
Content-Length: 0
Authorization: XXXX
Content-Type: application/octet-stream
Range: bytes=0-31

2022/09/18 20:17:58 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:17:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2022/09/18 20:17:59 DEBUG : HTTP RESPONSE (req 0xc000000001)
2022/09/18 20:17:59 DEBUG : HTTP/1.1 206 Partial Content
Content-Length: 32
Accept-Encoding: identity,gzip
Accept-Ranges: bytes
Content-Range: bytes 0-31/1073741000
Content-Security-Policy: sandbox allow-forms allow-scripts
Content-Type: application/octet-stream
Date: Sun, 18 Sep 2022 12:17:58 GMT
Original-Content-Length: 32
Server: envoy
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Dropbox-Response-Origin: far_remote
X-Robots-Tag: noindex, nofollow, noimageindex

2022/09/18 20:17:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2022/09/18 20:17:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:17:59 DEBUG : HTTP REQUEST (req 0xc000000002)
2022/09/18 20:17:59 DEBUG : POST /2/files/download HTTP/1.1
Host: content.dropboxapi.com
User-Agent: Archive
Content-Length: 0
Authorization: XXXX
Content-Type: application/octet-stream
Range: bytes=20190048-21304431

2022/09/18 20:17:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:18:00 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2022/09/18 20:18:00 DEBUG : HTTP RESPONSE (req 0xc000000002)
2022/09/18 20:18:00 DEBUG : HTTP/1.1 206 Partial Content
Content-Length: 1114384
Accept-Encoding: identity,gzip
Accept-Ranges: bytes
Content-Range: bytes 20190048-21304431/1073741000
Content-Security-Policy: sandbox allow-forms allow-scripts
Content-Type: application/octet-stream
Date: Sun, 18 Sep 2022 12:17:59 GMT
Original-Content-Length: 1114384
Server: envoy
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Dropbox-Response-Origin: far_remote
X-Robots-Tag: noindex, nofollow, noimageindex

Thanks for any help. Much appreciate.

Animosity022 · September 18, 2022, 2:34pm

It has to compare the local cache to what is on the remote to ensure nothing has changed.

There's no cost for Dropbox API usage.

It does store them in the cache, a debug log shows you when it is read locally as it'll have a present true type message.

That makes things super slow as for each 1M it has to do a HTTP request to get data so by setting that, you really ramp up the requests for data and you gimp it further by limiting the range requests to 64M. Best to remove those and use the defaults.

Earthwalker · September 18, 2022, 11:23pm

Thank you so much for the suggestions.

I'm still confused about why rclone needs 2 HTTP requests for every chunk, isn't this will cause more time to get the file? If yes, is there a way to reduce HTTP usage?

Animosity022 · September 19, 2022, 12:30am

It's pretty tough to tell what's going on with a snippet with a bunch of stuff commented out as the stuff removed are the key details to answer the question.

ncw · September 19, 2022, 6:59pm

Rclone requests the first 32 bytes so it can fetch the decryption nonce. It needs that to decrypt anything from the file.

In your original example it fetches the first 32 bytes, then fetches something from the middle of the file.

If you are streaming something, rclone will fetch the nonce just once and then stream things.

However once the stream is closed, rclone needs to fetch the nonce again.

Perhaps rclone should keep a cache of the nonces.

Can you describe the usage pattern which is causing the problem?

Earthwalker · September 19, 2022, 11:05pm

It's for my video editing project which checks frames in the original video file to locate the part required.

Seems it will reduce a lot cost of the storage server if rclone keeps cache of requested nonces.

ncw · September 20, 2022, 3:52pm

How does it do that? Does it

repeat lots
- open the file
- seek somewhere
- read the frame
- close the file?

That would provoke the worst case you are seeing.

If you can change it to this then it will keep the nonce in memory

open the file
repeat lots
- seek somewhere
- read the frame
close the file?

I suspect something odd about your access patterns because this isn't a problem for most people.

Animosity022 · September 20, 2022, 4:10pm

Most of the folks generally stream so the pattern above won't apply much and I can't say I've ever carried about reading an extra 31 bytes as my home connection transfers anywhere from 30-40TB a month so I'd have to do something insane to care about this in my use case (That doesn't mean the OP's item is less valid as I'm just stating my use case).

system · October 20, 2022, 4:10pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.