Motivation
Imagine you have a bucket in s3 with millions keys under the same prefix:
source-bucket/a/b/c/2020-01-01-10-00-00.log
source-bucket/a/b/c/2020-01-01-10-00-01.log
source-bucket/a/b/c/2020-01-01-10-00-02.log
...
source-bucket/a/b/c/2022-12-31-10-00-00.log
...
source-bucket/a/b/c/2024-12-31-10-00-00.log
You want to copy all the objects for Dec 2022. You need to filter keys by prefix "2022-12-".
Rclone doesn't support a path as a prefix for keys:
rclone copy my-s3-source-remote:source-bucket/a/b/c/2022-12- my-destination-remote:destination-bucket/d/e/f
We can use files matching pattern as a possible solution:
rclone copy my-s3-remote:bucket/prefix my-destination-remote:destination-bucket/d/e/f --include "2022-12-*"
With this approach Rclone sends ListObjectsV2 with a/b/c
as a prefix. In the result Rclone will list all millions of keys under the prefix and do the filtering later.
Optimized way is to send ListObjectsV2 request with a prefix
equals to [directory]/[key-prefix]: a/b/c/2022-12-
. With this approach S3 returns only keys starting with a/b/c/2022-12-
bypassing unnecessary keys.
"--fast-list" flag doesn't help with flat structure.
Proposal
Add a new flag --s3-list-key-prefix
that modifies the ListObjectsV2 (and ListObjects) request to include a server-side filter for keys. This significantly improve performance for a flat structures.
Implementation