I want to propose the option of using AWS bucket logging.
instead of bucket listing as a more efficient and less costly mechanism to get list of changes in a bucket.
of course, there are several issues with this proposal, so it won't be for every usecase:
bucket logging needs to be setup beforehand
this will give incremental syncing abilities
according to AWS documentation, there is no guarantee to when (or if) the logs will be available
thanks for the comment!
agree that this is a real problem with the proposal...
OTOH, doing bucket list on a multi-million objects bucket, just to figure out what changed in the past hour has its downside as well.
one of the things i thought is to add an option of "try_bucket_logging" instead of "use_bucket_logging". so, if there are no log objects available, we can go and do bucket listing instead (does not fix all the problems, though).
I am also working on a "reliable bucket logging" (I call that "journal mode") as part of the Ceph project. I know this is more nieche than AWS S3, but still...