Mount :: HTTP Request bytes=0-31 Frequently

Hi,

Using rclone v1.59.1 with Dropbox, found there was always one POST request bytes=0-31 happens together with one POST request that downloads the chunk needed.

I thought this will double HTTP requests, questions are:

  1. Why need this kind of request every time request a chunk?
  2. If this means more API requests and more time to process, is there anything we can do to reduce these costs?
  3. I have vfs cache set to full, why rclone request this part again and again instead of storing them in cache?

rclone configs:

/usr/bin/rclone mount cloud: /mnt \
--umask 222 --allow-other --buffer-size 0 \
--dump headers --log-level=DEBUG \
--vfs-read-chunk-size 1M --vfs-read-chunk-size-limit 64M \
--vfs-cache-mode full

Please find corresponding logs below:

2022/09/18 20:17:58 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:17:58 DEBUG : HTTP REQUEST (req 0xc000000001)
2022/09/18 20:17:58 DEBUG : POST /2/files/download HTTP/1.1
Host: content.dropboxapi.com
User-Agent: Archive
Content-Length: 0
Authorization: XXXX
Content-Type: application/octet-stream
Range: bytes=0-31

2022/09/18 20:17:58 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:17:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2022/09/18 20:17:59 DEBUG : HTTP RESPONSE (req 0xc000000001)
2022/09/18 20:17:59 DEBUG : HTTP/1.1 206 Partial Content
Content-Length: 32
Accept-Encoding: identity,gzip
Accept-Ranges: bytes
Content-Range: bytes 0-31/1073741000
Content-Security-Policy: sandbox allow-forms allow-scripts
Content-Type: application/octet-stream
Date: Sun, 18 Sep 2022 12:17:58 GMT
Original-Content-Length: 32
Server: envoy
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Dropbox-Response-Origin: far_remote
X-Robots-Tag: noindex, nofollow, noimageindex

2022/09/18 20:17:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2022/09/18 20:17:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:17:59 DEBUG : HTTP REQUEST (req 0xc000000002)
2022/09/18 20:17:59 DEBUG : POST /2/files/download HTTP/1.1
Host: content.dropboxapi.com
User-Agent: Archive
Content-Length: 0
Authorization: XXXX
Content-Type: application/octet-stream
Range: bytes=20190048-21304431

2022/09/18 20:17:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2022/09/18 20:18:00 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2022/09/18 20:18:00 DEBUG : HTTP RESPONSE (req 0xc000000002)
2022/09/18 20:18:00 DEBUG : HTTP/1.1 206 Partial Content
Content-Length: 1114384
Accept-Encoding: identity,gzip
Accept-Ranges: bytes
Content-Range: bytes 20190048-21304431/1073741000
Content-Security-Policy: sandbox allow-forms allow-scripts
Content-Type: application/octet-stream
Date: Sun, 18 Sep 2022 12:17:59 GMT
Original-Content-Length: 1114384
Server: envoy
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Dropbox-Response-Origin: far_remote
X-Robots-Tag: noindex, nofollow, noimageindex

Thanks for any help. Much appreciate.

It has to compare the local cache to what is on the remote to ensure nothing has changed.

There's no cost for Dropbox API usage.

It does store them in the cache, a debug log shows you when it is read locally as it'll have a present true type message.

That makes things super slow as for each 1M it has to do a HTTP request to get data so by setting that, you really ramp up the requests for data and you gimp it further by limiting the range requests to 64M. Best to remove those and use the defaults.

Thank you so much for the suggestions.

I'm still confused about why rclone needs 2 HTTP requests for every chunk, isn't this will cause more time to get the file? If yes, is there a way to reduce HTTP usage?

It's pretty tough to tell what's going on with a snippet with a bunch of stuff commented out as the stuff removed are the key details to answer the question.

Rclone requests the first 32 bytes so it can fetch the decryption nonce. It needs that to decrypt anything from the file.

In your original example it fetches the first 32 bytes, then fetches something from the middle of the file.

If you are streaming something, rclone will fetch the nonce just once and then stream things.

However once the stream is closed, rclone needs to fetch the nonce again.

Perhaps rclone should keep a cache of the nonces.

Can you describe the usage pattern which is causing the problem?

It's for my video editing project which checks frames in the original video file to locate the part required.

Seems it will reduce a lot cost of the storage server if rclone keeps cache of requested nonces.

How does it do that? Does it

  • repeat lots
    • open the file
    • seek somewhere
    • read the frame
    • close the file?

That would provoke the worst case you are seeing.

If you can change it to this then it will keep the nonce in memory

  • open the file
  • repeat lots
    • seek somewhere
    • read the frame
  • close the file?

I suspect something odd about your access patterns because this isn't a problem for most people.

Most of the folks generally stream so the pattern above won't apply much and I can't say I've ever carried about reading an extra 31 bytes as my home connection transfers anywhere from 30-40TB a month so I'd have to do something insane to care about this in my use case (That doesn't mean the OP's item is less valid as I'm just stating my use case).