Using RClone copy I know it's possible to use the global flags for filtering.
I didn't find out what is the right way to filter by storage class.
My source: Aws S3 bucket
Dest: Local netapp S3 bucket
What is the right way to copy from source to dest, but only files on STANDARD storage class ?
I've tried things like:
--s3-storage-class STANDARD --include "*"
But it's not the right way since this flag is only for new objects.
asdffdsa
(jojothehumanmonkey)
April 27, 2023, 11:35am
2
could do somthing like this
rclone lsf source: --format="pT" | grep "STANDARD" > files.lst
remove the ;STANDARD
from files.lst
rclone copy source: dest: --files-from=files.lst
ncw
(Nick Craig-Wood)
April 27, 2023, 11:53am
3
You could also use the --metadata
flag which will then put the tier
in the metadata .
You can then use metadata filtering to filter on that
Perhaps something like
rcone copy --metadata --metadata-include "tier=standard" src: dst:
Test first with --dry-run
or just use rclone lsf
on the source.
I tried your option, and after that with uppercase: --metadata --metadata-include "tier=STANDARD"
but none of them worked.
At first, I got the error no flag metadata
so I updated to the last version. Any other solution you can think of?
Edit:
Just saw --metadata
is a new option, if the filtering is not working with tier, do we need another thread for a bug?
It's a nice work around, but I'll keep that as my last option. Better to have a clean solution.
asdffdsa
(jojothehumanmonkey)
April 27, 2023, 1:08pm
6
yeah, using the new metadata feature is a cleaner solution.
time to study up on metadata.....
asdffdsa
(jojothehumanmonkey)
April 27, 2023, 1:09pm
7
post a full rclone debug log.
It's listing all my company file list, I can't do that.
But it keeps returning Excluded
for every file, for example:
2023/04/27 16:12:38 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "s3_aws:idan-glacier-test/" "/home/xdavidt/Downloads/temp" "--config=/home/xdavidt/.rclone.conf" "--metadata" "--metadata-include" "tier=GLACIER" "--dry-run" "--log-level" "DEBUG"]
2023/04/27 16:12:38 DEBUG : Creating backend with remote "s3_aws:idan-glacier-test/"
2023/04/27 16:12:38 DEBUG : Using config file from "/home/xdavidt/.rclone.conf"
2023/04/27 16:12:38 DEBUG : fs cache: renaming cache item "s3_aws:idan-glacier-test/" to be canonical "s3_aws:idan-glacier-test"
2023/04/27 16:12:38 DEBUG : Creating backend with remote "/home/xdavidt/Downloads/temp"
2023/04/27 16:12:38 NOTICE: S3 bucket idan-glacier-test: Switched region to "us-east-1" from "eu-west-1"
2023/04/27 16:12:38 DEBUG : pacer: low level retry 1/2 (error BucketRegionError: incorrect region, the bucket is not in 'eu-west-1' region at endpoint '', bucket is in 'us-east-1' region
status code: 301, request id: 0B2M3ZHT7SJXE33D, host id: Ney9N3TQOnRET7cFD+Ru7aXYdO6CT67cs0Qa8yA/Y9+2YRjRmrTlO0dtu+c/YSjYFHu1EDV//nU=)
2023/04/27 16:12:38 DEBUG : pacer: Rate limited, increasing sleep to 10ms
2023/04/27 16:12:39 DEBUG : pacer: Reducing sleep to 0s
2023/04/27 16:14:30 DEBUG : path/file.png: Excluded
It's only region error, it's ignored and fixed by rclone automatically, here is fixed version:
2023/04/27 16:26:28 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "--metadata" "--metadata-include" "tier=standard" "s3_aws_idan:idan-glacier-test/" "/home/xdavidt/Downloads/temp" "--config=/home/xdavidt/.rclone.conf" "--dry-run" "--log-level" "DEBUG"]
2023/04/27 16:26:28 DEBUG : Creating backend with remote "s3_aws_idan:idan-glacier-test/"
2023/04/27 16:26:28 DEBUG : Using config file from "/home/xdavidt/.rclone.conf"
2023/04/27 16:26:28 DEBUG : fs cache: renaming cache item "s3_aws_idan:idan-glacier-test/" to be canonical "s3_aws_idan:idan-glacier-test"
2023/04/27 16:26:28 DEBUG : Creating backend with remote "/home/xdavidt/Downloads/temp"
asdffdsa
(jojothehumanmonkey)
April 27, 2023, 1:36pm
11
rclone lsf remote: --format=pT
books.20200409.145236.7z;DEEP_ARCHIVE
books.20200921.180322.7z;DEEP_ARCHIVE
books.20210224.192949.7z;DEEP_ARCHIVE
books.20220321.185251.7z;DEEP_ARCHIVE
file.ext;STANDARD
rclone lsf remote: --metadata-include tier=DEEP_ARCHIVE -vv
2023/04/27 09:45:44 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "lsf" "remote:" "--metadata-include" "tier=DEEP_ARCHIVE" "-vv"]
2023/04/27 09:45:44 DEBUG : Creating backend with remote "remote:"
2023/04/27 09:45:44 DEBUG : Using config file from "/home/user01/.config/rclone/rclone.conf"
books.20200409.145236.7z
books.20200921.180322.7z
books.20210224.192949.7z
books.20220321.185251.7z
ncw
(Nick Craig-Wood)
April 27, 2023, 2:32pm
12
David_Tayar:
I tried your option, and after that with uppercase: --metadata --metadata-include "tier=STANDARD"
but none of them worked.
At first, I got the error no flag metadata
so I updated to the last version. Any other solution you can think of?
I think the problem here is that the metadata isn't actually being set because the storage class on a STANDARD object is not actually set.
Try this with --metadata --metadata-include "tier=STANDARD"
v1.63.0-beta.6985.b7f62e96c.fix-s3-metadata-tier on branch fix-s3-metadata-tier (uploaded in 15-30 mins)
asdffdsa
(jojothehumanmonkey)
April 27, 2023, 4:45pm
13
looks like the beta worked
./rclone lsf aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip --format=pT
books.20200409.145236.7z;DEEP_ARCHIVE
books.20200921.180322.7z;DEEP_ARCHIVE
books.20210224.192949.7z;DEEP_ARCHIVE
books.20220321.185251.7z;DEEP_ARCHIVE
file.ext;STANDARD
./rclone lsf aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip --metadata-include tier=STANDARD -vv
2023/04/27 12:44:10 DEBUG : rclone: Version "v1.63.0-beta.6985.b7f62e96c.fix-s3-metadata-tier" starting with parameters ["./rclone" "lsf" "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip" "--metadata-include" "tier=STANDARD" "-vv"]
2023/04/27 12:44:10 DEBUG : Creating backend with remote "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip"
2023/04/27 12:44:10 DEBUG : Using config file from "/home/user01/.config/rclone/rclone.conf"
2023/04/27 12:44:10 DEBUG : books.20200409.145236.7z: Excluded
2023/04/27 12:44:10 DEBUG : books.20200921.180322.7z: Excluded
2023/04/27 12:44:10 DEBUG : books.20210224.192949.7z: Excluded
2023/04/27 12:44:10 DEBUG : books.20220321.185251.7z: Excluded
file.ext
2023/04/27 12:44:10 DEBUG : 4 go routines active
./rclone lsf aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip --metadata-include tier=DEEP_ARCHIVE -vv
2023/04/27 12:44:11 DEBUG : rclone: Version "v1.63.0-beta.6985.b7f62e96c.fix-s3-metadata-tier" starting with parameters ["./rclone" "lsf" "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip" "--metadata-include" "tier=DEEP_ARCHIVE" "-vv"]
2023/04/27 12:44:11 DEBUG : Creating backend with remote "aws01:vserver03.en07.rcloner/en07.rcloner/rclone/backup/books/zip"
2023/04/27 12:44:11 DEBUG : Using config file from "/home/user01/.config/rclone/rclone.conf"
books.20200409.145236.7z
books.20200921.180322.7z
books.20210224.192949.7z
books.20220321.185251.7z
2023/04/27 12:44:11 DEBUG : file.ext: Excluded
2023/04/27 12:44:11 DEBUG : 4 go routines active
1 Like
ncw
(Nick Craig-Wood)
April 28, 2023, 1:33pm
14
Thanks for testing I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.63
Well, it looks better in beta!
Not sure what is the benefit of --metadata
before --metadata-include
?
I still have some errors, not sure if need to ignore them or not (they wasn't show on normal version, only on beta):
xdavidt@David-Linux:~$ rclone copy s3_aws_idan:idan-glacier-test/path/ ~/Downloads/temp/ --config ~/.rclone.conf --metadata-include tier=STANDARD --dry-run
2023/04/30 09:24:02 ERROR : : Entry doesn't belong in directory "" (same as directory) - ignoring
2023/04/30 09:24:10 NOTICE: file1.csv: Skipped copy as --dry-run is set (size 157)
2023/04/30 09:24:10 NOTICE: file2.csv: Skipped copy as --dry-run is set (size 2.149Ki)
2023/04/30 09:24:10 NOTICE: Description.json: Skipped copy as --dry-run is set (size 19.031Ki)
2023/04/30 09:24:10 NOTICE: Recommendations.csv: Skipped copy as --dry-run is set (size 157)
2023/04/30 09:24:10 ERROR : Files: Entry doesn't belong in directory "Files" (too short) - ignoring
2023/04/30 09:24:10 ERROR : Jobs: Entry doesn't belong in directory "Jobs" (too short) - ignoring
2023/04/30 09:24:10 ERROR : Conf: Entry doesn't belong in directory "Conf" (too short) - ignoring
ncw
(Nick Craig-Wood)
April 30, 2023, 4:22pm
16
Good
You need the --metadata
flag to turn on the metadata feature. --metadata-include
could do this too I guess, but it doesn't!
Not sure what these are - maybe directory markers? You can probably ignore those if so.
Is there any option to enable this filter on the backend?
I have a bucket in which 10% of the files are Standard class and all other on Glacier (90%), which make this copy much longer.
If we have a backend flag for s3, we will have a list of only Standard files.
ncw
(Nick Craig-Wood)
May 8, 2023, 4:05pm
18
We do get the StorageClass
in the listings so it would be possible to avoid the HEAD requests if this was a backend flag. This would make it a lot quicker certainly!
@David_Tayar this would need implementing though. Is this something you'd like to do? Or sponsor maybe?
system
(system)
Closed
July 7, 2023, 4:06pm
19
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.