Copy from s3 to another s3 filter by storage class

Using RClone copy I know it's possible to use the global flags for filtering.
I didn't find out what is the right way to filter by storage class.
My source: Aws S3 bucket
Dest: Local netapp S3 bucket

What is the right way to copy from source to dest, but only files on STANDARD storage class?
I've tried things like:

--s3-storage-class STANDARD --include "*"

But it's not the right way since this flag is only for new objects.

could do somthing like this

  1. rclone lsf source: --format="pT" | grep "STANDARD" > files.lst
  2. remove the ;STANDARD from files.lst
  3. rclone copy source: dest: --files-from=files.lst

You could also use the --metadata flag which will then put the tier in the metadata.

You can then use metadata filtering to filter on that

Perhaps something like

rcone copy --metadata --metadata-include "tier=standard" src: dst:

Test first with --dry-run or just use rclone lsf on the source.

I tried your option, and after that with uppercase: --metadata --metadata-include "tier=STANDARD" but none of them worked.
At first, I got the error no flag metadata so I updated to the last version. Any other solution you can think of?

Edit:
Just saw --metadata is a new option, if the filtering is not working with tier, do we need another thread for a bug?

It's a nice work around, but I'll keep that as my last option. Better to have a clean solution.

yeah, using the new metadata feature is a cleaner solution.
time to study up on metadata.....

post a full rclone debug log.

It's listing all my company file list, I can't do that.
But it keeps returning Excluded for every file, for example:

2023/04/27 16:12:38 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "s3_aws:idan-glacier-test/" "/home/xdavidt/Downloads/temp" "--config=/home/xdavidt/.rclone.conf" "--metadata" "--metadata-include" "tier=GLACIER" "--dry-run" "--log-level" "DEBUG"]
2023/04/27 16:12:38 DEBUG : Creating backend with remote "s3_aws:idan-glacier-test/"
2023/04/27 16:12:38 DEBUG : Using config file from "/home/xdavidt/.rclone.conf"
2023/04/27 16:12:38 DEBUG : fs cache: renaming cache item "s3_aws:idan-glacier-test/" to be canonical "s3_aws:idan-glacier-test"
2023/04/27 16:12:38 DEBUG : Creating backend with remote "/home/xdavidt/Downloads/temp"
2023/04/27 16:12:38 NOTICE: S3 bucket idan-glacier-test: Switched region to "us-east-1" from "eu-west-1"
2023/04/27 16:12:38 DEBUG : pacer: low level retry 1/2 (error BucketRegionError: incorrect region, the bucket is not in 'eu-west-1' region at endpoint '', bucket is in 'us-east-1' region
	status code: 301, request id: 0B2M3ZHT7SJXE33D, host id: Ney9N3TQOnRET7cFD+Ru7aXYdO6CT67cs0Qa8yA/Y9+2YRjRmrTlO0dtu+c/YSjYFHu1EDV//nU=)
2023/04/27 16:12:38 DEBUG : pacer: Rate limited, increasing sleep to 10ms
2023/04/27 16:12:39 DEBUG : pacer: Reducing sleep to 0s
2023/04/27 16:14:30 DEBUG : path/file.png: Excluded

need to fix that.

It's only region error, it's ignored and fixed by rclone automatically, here is fixed version:

2023/04/27 16:26:28 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "--metadata" "--metadata-include" "tier=standard" "s3_aws_idan:idan-glacier-test/" "/home/xdavidt/Downloads/temp" "--config=/home/xdavidt/.rclone.conf" "--dry-run" "--log-level" "DEBUG"]
2023/04/27 16:26:28 DEBUG : Creating backend with remote "s3_aws_idan:idan-glacier-test/"
2023/04/27 16:26:28 DEBUG : Using config file from "/home/xdavidt/.rclone.conf"
2023/04/27 16:26:28 DEBUG : fs cache: renaming cache item "s3_aws_idan:idan-glacier-test/" to be canonical "s3_aws_idan:idan-glacier-test"
2023/04/27 16:26:28 DEBUG : Creating backend with remote "/home/xdavidt/Downloads/temp"
rclone lsf remote: --format=pT
books.20200409.145236.7z;DEEP_ARCHIVE
books.20200921.180322.7z;DEEP_ARCHIVE
books.20210224.192949.7z;DEEP_ARCHIVE
books.20220321.185251.7z;DEEP_ARCHIVE
file.ext;STANDARD

rclone lsf remote: --metadata-include tier=DEEP_ARCHIVE -vv
2023/04/27 09:45:44 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "lsf" "remote:" "--metadata-include" "tier=DEEP_ARCHIVE" "-vv"]
2023/04/27 09:45:44 DEBUG : Creating backend with remote "remote:"
2023/04/27 09:45:44 DEBUG : Using config file from "/home/user01/.config/rclone/rclone.conf"
books.20200409.145236.7z
books.20200921.180322.7z
books.20210224.192949.7z
books.20220321.185251.7z

I think the problem here is that the metadata isn't actually being set because the storage class on a STANDARD object is not actually set.

Try this with --metadata --metadata-include "tier=STANDARD"

v1.63.0-beta.6985.b7f62e96c.fix-s3-metadata-tier on branch fix-s3-metadata-tier (uploaded in 15-30 mins)

looks like the beta worked

./rclone lsf aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip --format=pT
books.20200409.145236.7z;DEEP_ARCHIVE
books.20200921.180322.7z;DEEP_ARCHIVE
books.20210224.192949.7z;DEEP_ARCHIVE
books.20220321.185251.7z;DEEP_ARCHIVE
file.ext;STANDARD

./rclone lsf aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip --metadata-include tier=STANDARD -vv
2023/04/27 12:44:10 DEBUG : rclone: Version "v1.63.0-beta.6985.b7f62e96c.fix-s3-metadata-tier" starting with parameters ["./rclone" "lsf" "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip" "--metadata-include" "tier=STANDARD" "-vv"]
2023/04/27 12:44:10 DEBUG : Creating backend with remote "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip"
2023/04/27 12:44:10 DEBUG : Using config file from "/home/user01/.config/rclone/rclone.conf"
2023/04/27 12:44:10 DEBUG : books.20200409.145236.7z: Excluded
2023/04/27 12:44:10 DEBUG : books.20200921.180322.7z: Excluded
2023/04/27 12:44:10 DEBUG : books.20210224.192949.7z: Excluded
2023/04/27 12:44:10 DEBUG : books.20220321.185251.7z: Excluded
file.ext
2023/04/27 12:44:10 DEBUG : 4 go routines active

./rclone lsf aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip --metadata-include tier=DEEP_ARCHIVE -vv
2023/04/27 12:44:11 DEBUG : rclone: Version "v1.63.0-beta.6985.b7f62e96c.fix-s3-metadata-tier" starting with parameters ["./rclone" "lsf" "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip" "--metadata-include" "tier=DEEP_ARCHIVE" "-vv"]
2023/04/27 12:44:11 DEBUG : Creating backend with remote "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip"
2023/04/27 12:44:11 DEBUG : Using config file from "/home/user01/.config/rclone/rclone.conf"
books.20200409.145236.7z
books.20200921.180322.7z
books.20210224.192949.7z
books.20220321.185251.7z
2023/04/27 12:44:11 DEBUG : file.ext: Excluded
2023/04/27 12:44:11 DEBUG : 4 go routines active
1 Like

Thanks for testing :slight_smile: I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.63

Well, it looks better in beta!
Not sure what is the benefit of --metadata before --metadata-include?

I still have some errors, not sure if need to ignore them or not (they wasn't show on normal version, only on beta):

xdavidt@David-Linux:~$ rclone copy s3_aws_idan:idan-glacier-test/path/ ~/Downloads/temp/ --config ~/.rclone.conf --metadata-include tier=STANDARD --dry-run 
2023/04/30 09:24:02 ERROR : : Entry doesn't belong in directory "" (same as directory) - ignoring
2023/04/30 09:24:10 NOTICE: file1.csv: Skipped copy as --dry-run is set (size 157)
2023/04/30 09:24:10 NOTICE: file2.csv: Skipped copy as --dry-run is set (size 2.149Ki)
2023/04/30 09:24:10 NOTICE: Description.json: Skipped copy as --dry-run is set (size 19.031Ki)
2023/04/30 09:24:10 NOTICE: Recommendations.csv: Skipped copy as --dry-run is set (size 157)
2023/04/30 09:24:10 ERROR : Files: Entry doesn't belong in directory "Files" (too short) - ignoring
2023/04/30 09:24:10 ERROR : Jobs: Entry doesn't belong in directory "Jobs" (too short) - ignoring
2023/04/30 09:24:10 ERROR : Conf: Entry doesn't belong in directory "Conf" (too short) - ignoring

Good

You need the --metadata flag to turn on the metadata feature. --metadata-include could do this too I guess, but it doesn't!

Not sure what these are - maybe directory markers? You can probably ignore those if so.

Is there any option to enable this filter on the backend?
I have a bucket in which 10% of the files are Standard class and all other on Glacier (90%), which make this copy much longer.
If we have a backend flag for s3, we will have a list of only Standard files.

We do get the StorageClass in the listings so it would be possible to avoid the HEAD requests if this was a backend flag. This would make it a lot quicker certainly!

@David_Tayar this would need implementing though. Is this something you'd like to do? Or sponsor maybe?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.