Double uncached metadata read on copy for internetarchive

Simple copy (from remote to local) operations with internetarchive backend result in two identical /metadata/ reads for the whole item:

NewFs -> Fs.NewObject -> listAllUnconstrained -> requestMetadata
operations.moveOrCopyFile -> Fs.NewObject -> listAllUnconstrained -> requestMetadata

This is an expensive operation and results in aggressive ratelimiting, especially because the requests arrive immediately one after the other and count towards burst limits.

When reading large items with many files sequentially, this is fairly amortized. However, the notion of "item" in IA infrastructure is of somewhat lower rank than "bucket" elsewhere and we often need to read many small items (with high top-level parallelism by necessity) and accordingly encounter metadata read amplifications and ratelimiting.

From what I understand the two calls for most backends happen because NewFs will read the bucket metadata and moveOrCopyFile will read the object metadata; in this case both calls in effect pull bucket metadata.

Is it possible to squirrel away the item metadata (in ctx perhaps, or on the Fs which is passed to moveOrCopyFile) between the calls such that we don't re-request it again? This may present some cache invalidation challenges but there are many cases (reads-only) where this would be a big help and as we know remote updates with internetarchive are extremely not realtime.

OK, would it be preferable to open a PR with a change that caches item metadata inside the parent Fs object?

I have a very basic set of changes here, not a PR yet as it's not ready but I was hoping to get a head check: Comparing rclone:master...parkan:master · rclone/rclone · GitHub

The intent here is to avoid gumming up any of the shared types and keep most of the logict intact but cache the metadata requests as far away from the core logic as possible. For "normal" use this only saves 1 hit, but in our use case (downloading multiple files from multiple items in multiple segments with boundaries dictated by application needs, i.e. cutom Range) this potentially saves hundreds of hits to the metadata endpoint.

For additional context, I am an Internet Archive employee and an incoming maintainer for GitHub - data-preservation-programs/singularity: Tool for onboarding data to the Filecoin Network which uses rclone as a pluggable transport.

in our testing so far this saves 90% of /metadata/ hits, significantly unblocking use