What options do I have to speed up s3 modtime HEAD request?

What is the problem you are having with reclone?

Listing s3 directory containing lots of files is slow because of the HEAD request rclone makes to get the mod time.

What options do I have to speed up this bottleneck without turning off the modtime?

does rclone use/support parallel HEAD request when listing files?

how about pre-caching the modtime on mount so that the HEAD request is not done on demand? is it possible?

What is your rclone version (output from rclone version)

1.57.0

Which cloud storage system are you using? (eg Google Drive)

Wasabi s3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

.\rclone mount "wasabi:bucketName" * --vfs-cache-mode full --vfs-cache-max-age 168h --dir-cache-time 168h --cache-dir ".\cache\WasabiMount" --transfers 32 --checkers 32 --multi-thread-cutoff 1M --multi-thread-streams 32 --poll-interval 10s 

Check out:

https://rclone.org/s3/#avoiding-head-requests-to-read-the-modification-time

While that is a good solution to combat the slow listing, it would be nice if there is some way to still get the actual modtime while speeding things up. Thus why I posted this question.

hello and welcome to the forum,

  • i also use wasabi, as my primary cloud provider.
  • there is a way to pre-cache vfs cache, tho not sure it caches mod-time.
  1. add --rc to the mount command
  2. run this on-demand to refresh the vfs cache
    rclone rc vfs/refresh recursive=true -vv

tho i have never used it on a mount, to get a deeper look into what rclone is doing, add this to your rclone command
--dump=headers --retries=1 --low-level-retries=1 --log-level=DEBUG --log-file=rclone.log

I shared the link as that explains why it gets a head request and most providers charge per call so there isn't a way on a mount to get that info as it requires a head request thus my reason for sharing the link I can't know for sure if you saw it or read it, thus I shared it.

I'd be mindful as those requests == money usually. Does Wasabi charge for those requests?

good point, wasabi does not charge for api calls or egrees.
and does not seem to throttle connections, as i have tested --transfers=256 on 1Gbps fios.

Does a refresh actually pull the data in question? I don't use S3 / Wasabi.

with s3, vfs/fresh does pull in the data.
tho with wasabi, with such fast api calls, and 1Gbps internet, the overall effect is not so noticeable.

i do have a question, as to exactly what data does rclone pull?
of course the filename, but what other info, modtime, size, etc... ?

Best of my knowledge, just the filename, size and modtime. Modtime can be a tricky one as it depends on the backend and what they store as to what it reports back.

I know Google, you can use created time to replace modtime.

Dropbox does it a little bit different:

https://rclone.org/dropbox/#modified-time-and-hashes

Can't say I've looked too much to see what other ones do.

I try the vfs/refresh command and it seem like it only cache the directory and not the modtime, because when I try to open the mounted directory that cotains many files, it still takes a while for it to get through and show the files.

For s3 backend, I believe it only returns the modtime as the time when the file is uploaded, not the actual modtime metadata of the file.

there could be many reasons for that.
what is it, windows explorer or what?

yes, windows explorer.
Basically, if i use --use-server-modtime or --no-modtime, the folder's files (about 500 items) will show up almost instantly. But as soon as i remove those flags, the folder will take a somewhat long time to open up. The windows explorer will not respond in that timeframe.

yes, you are correct, i had forgotten this
"In particular S3 and Swift benefit hugely from the --no-modtime flag (or use --use-server-modtime for a slightly different effect) as each read of the modification time takes a transaction."

Unfortunately there is no nice workaround for this.

Rclone stores modtimes as metadata on the object and these need an extra HEAD request to read.

So either you can have the time the file was uploaded with --use-server-modtime instantly or wait for the HEAD request.

We need to persuade AWS to either let you set the Last-Modified time when you upload the object or return the metadata in the bucket listings. There isn't much chance of that and I have mentioned it to various AWS people in the past!

Just wondering, does rclone use parallel HEAD request when listing s3 files?
Or rather, does s3 support parallel head request?

It can do, for instance in a sync that will be controlled by --checkers.

However for a mount I don't think it does issue them in parallel unless the application using the mount issues the stat calls in parallel which it probably wont.

s3 will support whatever you can throw at it - it is amazingly robust! Amazon will charge you for each of those HEAD requests, but they'll work fine.

hi,
in this case, both the OP and myself, we use wasabi, which:

  • does not charge for api calls
  • does not charge for egress
  • does not seem to throttle apis calls much, as i have tested --checkers=256 --transfers=256
1 Like

In that case, it seem like one good way to speed up listing in mounts will be to have a flag that, when set, will ask rclone to pre-cache modtime/metadata of certain directory on mount(or serve), using parallel HEAD request for said pre-caching.
This way, one won't have to go through a lengthy wait time for the HEAD request whenever they try to open a folder with lots of files.

The plan, in the not too distant future, is to cache the metadata on disk as well as the data. This would then fix your problem as rclone wouldn't need to do any requests to read the data.