Use bucket logging in case of S3 source

I want to propose the option of using AWS bucket logging.
instead of bucket listing as a more efficient and less costly mechanism to get list of changes in a bucket.

of course, there are several issues with this proposal, so it won't be for every usecase:

  • bucket logging needs to be setup beforehand
  • this will give incremental syncing abilities
  • according to AWS documentation, there is no guarantee to when (or if) the logs will be available

I made a very perliminary draft of the idea: WIP: use bucket logging instead of listing object in a bucket by yuvalif · Pull Request #1 · yuvalif/rclone · GitHub
there are many bits missing there, but i would appriciate some feedback on ther idea before i move ahead with the code

welcome to the forum, good first post.

imho, no way to trust logging.

Logging requests with server access logging - Amazon Simple Storage Service
1."The completeness and timeliness of server logging is not guaranteed"
2. "might not be delivered at all"
3. "possible that you might even see a duplication of a log record"

1 Like

thanks for the comment!
agree that this is a real problem with the proposal...
OTOH, doing bucket list on a multi-million objects bucket, just to figure out what changed in the past hour has its downside as well.
one of the things i thought is to add an option of "try_bucket_logging" instead of "use_bucket_logging". so, if there are no log objects available, we can go and do bucket listing instead (does not fix all the problems, though).
I am also working on a "reliable bucket logging" (I call that "journal mode") as part of the Ceph project. I know this is more nieche than AWS S3, but still...