and I ran the copy command with this policy, and the files were copied. But the logs show something strange:
Here is the log of one of the files (it's repeated in all of them):
2022/12/29 11:27:55 DEBUG : fs cache: renaming cache item "s3_aws:/data-collector-nd/DF/Nano/Test/019/Coast" to be canonical "s3_aws:data-collector-nd/DF/Nano/Test/019/Coast"
2022/12/29 11:27:56 NOTICE: Coast-Test (1).jpeg: Failed to read metadata: Forbidden: Forbidden
status code: 403, request id: KZ66B0XZJWTJZ7RS, host id: sPmpbG2pR11bus6T0xlqtMwvDz9wgejCQ4KaGnrMPSUHNFIzM+zOARfjVu+qlLI/sURYpBJ8pWk=
2022/12/29 11:27:56 DEBUG : Coast-Test (1).jpeg: Modification times differ by 335h41m28.24063s: 2022-12-15 11:46:27.8482423 +0200 IST, 2022-12-29 11:27:56.0888723 +0200 IST m=+0.853214601
2022/12/29 11:27:56 DEBUG : Coast-Test (1).jpeg: md5 = 186f5d7eaaa2d3035844f3790f85b7da OK
Seem at the end it's looking for the checksum and the file look OK, so no need to copy it, but before that rclone looks for metadata match and can't get the file to check that. I think I miss understand the way copy works.
Is for each file, it downloading the file into the temp folder, checks metadata and checksum, and if something changed copies? or it's listing it? why my policy isn't enough?
as far as i know, to allow rclone to read the metadata, need to use s3:GetObject
rclone does not use a temp folder with S3.
rclone does not download the dest file, rclone simply reads the metadata.
for each source file, rclone checks the dest by reading metadata.
then rclone decides is there is a need to copy the source file to the dest.
in addition, before rclone copies a file from source to dest,
rclone calculates the md5 of the source file.
after the upload completes, rclone compares the md5 of the source to the md5 as generated by s3 provider.
to prevent that, i use the following:
--- MFA for IAM users - without that, rclone, or any app, cannot download
--- SSE-C - without that, rclone, or any app, cannot download
--- session token - again, without that, rclone cannot download.
--- another option, i have not yet tested, create a policy that requires a specific header.
Thx for your answer, it is very informative.
Maybe some context will make you understand my situation:
Let's say I have 3 customers that's need to upload data that improve my algorithm,
I want to provide them a rclone schedule with the same access key for all of them, this key will allow them to upload data on-going, but if someone will try to abuse my keys and see other customers' data, he will be blocked, since he will able to list only.
Maybe I need to improve my policy, but I just drop by to check if we can avoid metadata checks.
Bytheway - I saw the logs report they aren't able to get the modification DateTime, but using Cyberduck with the same key, I was able to see it, so we going back to "not need access to the object"
which modtime do you mean?
did you read the link i shared?
the modtime, as saved by s3, which is the time the file was uploaded.
or
the modtime that rclone saves as metadata, with the acutal modtime of the source file.
by default, rclone uses its own modtime, and that is stored as metadata that rclone has to head
notice the HEAD on the first run, to get X-Amz-Meta-Mtime