What options do I have to speed up s3 modtime HEAD request?

kayakingpingpong · November 26, 2021, 1:34pm

What is the problem you are having with reclone?

Listing s3 directory containing lots of files is slow because of the HEAD request rclone makes to get the mod time.

What options do I have to speed up this bottleneck without turning off the modtime?

does rclone use/support parallel HEAD request when listing files?

how about pre-caching the modtime on mount so that the HEAD request is not done on demand? is it possible?

What is your rclone version (output from `rclone version`)

1.57.0

Which cloud storage system are you using? (eg Google Drive)

Wasabi s3

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

.\rclone mount "wasabi:bucketName" * --vfs-cache-mode full --vfs-cache-max-age 168h --dir-cache-time 168h --cache-dir ".\cache\WasabiMount" --transfers 32 --checkers 32 --multi-thread-cutoff 1M --multi-thread-streams 32 --poll-interval 10s

Animosity022 · November 26, 2021, 1:38pm

Check out:

https://rclone.org/s3/#avoiding-head-requests-to-read-the-modification-time

kayakingpingpong · November 26, 2021, 2:00pm

While that is a good solution to combat the slow listing, it would be nice if there is some way to still get the actual modtime while speeding things up. Thus why I posted this question.

asdffdsa · November 26, 2021, 2:19pm

hello and welcome to the forum,

i also use wasabi, as my primary cloud provider.
there is a way to pre-cache vfs cache, tho not sure it caches mod-time.

add --rc to the mount command
run this on-demand to refresh the vfs cache
rclone rc vfs/refresh recursive=true -vv

tho i have never used it on a mount, to get a deeper look into what rclone is doing, add this to your rclone command
--dump=headers --retries=1 --low-level-retries=1 --log-level=DEBUG --log-file=rclone.log

Animosity022 · November 26, 2021, 4:34pm

I shared the link as that explains why it gets a head request and most providers charge per call so there isn't a way on a mount to get that info as it requires a head request thus my reason for sharing the link I can't know for sure if you saw it or read it, thus I shared it.

I'd be mindful as those requests == money usually. Does Wasabi charge for those requests?

asdffdsa · November 26, 2021, 6:56pm

good point, wasabi does not charge for api calls or egrees.
and does not seem to throttle connections, as i have tested --transfers=256 on 1Gbps fios.

Animosity022 · November 26, 2021, 7:23pm

Does a refresh actually pull the data in question? I don't use S3 / Wasabi.

asdffdsa · November 26, 2021, 7:31pm

with s3, vfs/fresh does pull in the data.
tho with wasabi, with such fast api calls, and 1Gbps internet, the overall effect is not so noticeable.

i do have a question, as to exactly what data does rclone pull?
of course the filename, but what other info, modtime, size, etc... ?

Animosity022 · November 26, 2021, 8:14pm

Best of my knowledge, just the filename, size and modtime. Modtime can be a tricky one as it depends on the backend and what they store as to what it reports back.

I know Google, you can use created time to replace modtime.

Dropbox does it a little bit different:

https://rclone.org/dropbox/#modified-time-and-hashes

Can't say I've looked too much to see what other ones do.

kayakingpingpong · November 27, 2021, 2:19pm

I try the vfs/refresh command and it seem like it only cache the directory and not the modtime, because when I try to open the mounted directory that cotains many files, it still takes a while for it to get through and show the files.

kayakingpingpong · November 27, 2021, 2:21pm

For s3 backend, I believe it only returns the modtime as the time when the file is uploaded, not the actual modtime metadata of the file.

asdffdsa · November 27, 2021, 2:50pm

there could be many reasons for that.
what is it, windows explorer or what?

kayakingpingpong · November 27, 2021, 3:08pm

yes, windows explorer.
Basically, if i use --use-server-modtime or --no-modtime, the folder's files (about 500 items) will show up almost instantly. But as soon as i remove those flags, the folder will take a somewhat long time to open up. The windows explorer will not respond in that timeframe.

asdffdsa · November 27, 2021, 3:14pm

yes, you are correct, i had forgotten this
"In particular S3 and Swift benefit hugely from the --no-modtime flag (or use --use-server-modtime for a slightly different effect) as each read of the modification time takes a transaction."

ncw · November 27, 2021, 3:22pm

Unfortunately there is no nice workaround for this.

Rclone stores modtimes as metadata on the object and these need an extra HEAD request to read.

So either you can have the time the file was uploaded with --use-server-modtime instantly or wait for the HEAD request.

We need to persuade AWS to either let you set the Last-Modified time when you upload the object or return the metadata in the bucket listings. There isn't much chance of that and I have mentioned it to various AWS people in the past!

kayakingpingpong · November 27, 2021, 3:29pm

Just wondering, does rclone use parallel HEAD request when listing s3 files?
Or rather, does s3 support parallel head request?

ncw · November 27, 2021, 3:31pm

It can do, for instance in a sync that will be controlled by --checkers.

However for a mount I don't think it does issue them in parallel unless the application using the mount issues the stat calls in parallel which it probably wont.

s3 will support whatever you can throw at it - it is amazingly robust! Amazon will charge you for each of those HEAD requests, but they'll work fine.

asdffdsa · November 27, 2021, 3:34pm

hi,
in this case, both the OP and myself, we use wasabi, which:

does not charge for api calls
does not charge for egress
does not seem to throttle apis calls much, as i have tested --checkers=256 --transfers=256

kayakingpingpong · November 27, 2021, 3:56pm

In that case, it seem like one good way to speed up listing in mounts will be to have a flag that, when set, will ask rclone to pre-cache modtime/metadata of certain directory on mount(or serve), using parallel HEAD request for said pre-caching.
This way, one won't have to go through a lengthy wait time for the HEAD request whenever they try to open a folder with lots of files.

ncw · November 28, 2021, 1:15pm

The plan, in the not too distant future, is to cache the metadata on disk as well as the data. This would then fix your problem as rclone wouldn't need to do any requests to read the data.

What options do I have to speed up s3 modtime HEAD request?

What is the problem you are having with reclone?

What is your rclone version (output from rclone version)

Which cloud storage system are you using? (eg Google Drive)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

What is your rclone version (output from `rclone version`)

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)