I would like to copy contents of a bucket to a local filsystem.
I have two folders that the data needs to end up in.
Archive is ro once data is in there but InUse gets data deleted from it quickly.
I need the data to end up in both so I thought I'd run copy to InUse with --compare-dest=Archive
Then copy to Archive but what's the guarantee that the second run won't find more files on s3 and therefor makes InUse not to get them in the next run. (Unfortunately the min-age param only accepts relative time like 5seconds and not a unix timestamp that could be propagated to both commands)
How should I set this up?
Is my best chance really to lsd the source bucket and the dest Archive folder, do a diff
And then do two copy operation based on the files , first InUse than Archive?
Thanks in advance
hello and welcome to the forum,
you could use
rclone mount and then use unix commands.
I'm not sure I understand exactly what you are trying to achieve. How does InUse get stuff removed from in?
If you don't want two copies then sync with --backup-dir would do the job.
It would be a relatively easy change to make them take a timestamp too...
Well InUse is used by customers to take data and delete files once the data has been copied over in order to track the progress of the copy. (don't ask me why they can't compare to their folder etc.)
I hoped to find an easy way to copy data from s3 into both folders but to detect based on Archive what needs to end up in InUse.
Both folder gets the same data but since InUse gets data deleted from it , I need to compare what has been downloaded so far against another folder like Archive.
Withtout the timestamp feature there's no way to "atomically" copy to InUse based on both source and Archive folder and then copying those files into Archive as well.
If you want two copies, you'll have to run two rclone commands... There isn't anything which will duplicate a file at the moment.
Maybe you should do something like this
rclone lsf --files-only -R Archive | sort > before
rclone sync bucket: Archive
rclone lsf --files-only -R Archive | sort > after
comm -3 before after > new-files
rclone copy --files-from new-files Archive InUse # no need to copy from bucket here
This makes Archive a complete copy of the bucket and InUse files are files which have been newly created in Archive.
Yes , that's what I thought as well.
I think what makes this a bit harder is that Archive also gets it's files expired after some days but I can do a max-age trick with that.
lfs is extremely slow compared to regular ls. On a bucket with ¬50k objects ls takes 40 seconds, lfs is like 10 minutes.
That is strange it should be the same speed, it is pretty much the same code.
Can you post your rclone command line and a copy of your config (with secrets XXX-ed) out?
type = s3
provider = aws
env_auth = false
access_key_id = xx
secret_access_key = xx
region = eu-west-1
rclone ls aws:bucket/Archive vs
rclone lfs aws:bucket/Archive
--fast-list doesn't make much difference either. it might be that the formatting is slow somehow inside rclone, not the listing itself.
In fact, it's more slow than I thought. It's unfeasible to work with lsf with this speed
Hmm a bit of digging into that shows that
lsf is reading the MimeType for each object even when the user didn't ask for it - that takes another transaction for an s3 backend.
I attempted to fix that here - can you have a go? This should make
ls the same speed as
lsf (provided you don't ask for the mime type).
https://beta.rclone.org/branch/v1.50.2-194-gbfd9f321-fix-ls-mime-type-beta/ (uploaded in 15-30 mins)
Wow man, that did the trick. one extra call per object I guess...
When will be this tweak released?
I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.51
v1.51 will probably be released next weekend...
That's great. Thanks a lot for the help and the tweaking!
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.