I would like to copy contents of a bucket to a local filsystem.
I have two folders that the data needs to end up in.
/Archive
/InUse
Archive is ro once data is in there but InUse gets data deleted from it quickly.
I need the data to end up in both so I thought I'd run copy to InUse with --compare-dest=Archive
Then copy to Archive but what's the guarantee that the second run won't find more files on s3 and therefor makes InUse not to get them in the next run. (Unfortunately the min-age param only accepts relative time like 5seconds and not a unix timestamp that could be propagated to both commands)
How should I set this up?
Is my best chance really to lsd the source bucket and the dest Archive folder, do a diff
And then do two copy operation based on the files , first InUse than Archive?
Well InUse is used by customers to take data and delete files once the data has been copied over in order to track the progress of the copy. (don't ask me why they can't compare to their folder etc.)
I hoped to find an easy way to copy data from s3 into both folders but to detect based on Archive what needs to end up in InUse.
Both folder gets the same data but since InUse gets data deleted from it , I need to compare what has been downloaded so far against another folder like Archive.
Withtout the timestamp feature there's no way to "atomically" copy to InUse based on both source and Archive folder and then copying those files into Archive as well.
If you want two copies, you'll have to run two rclone commands... There isn't anything which will duplicate a file at the moment.
Maybe you should do something like this
rclone lsf --files-only -R Archive | sort > before
rclone sync bucket: Archive
rclone lsf --files-only -R Archive | sort > after
comm -3 before after > new-files
rclone copy --files-from new-files Archive InUse # no need to copy from bucket here
This makes Archive a complete copy of the bucket and InUse files are files which have been newly created in Archive.
Yes , that's what I thought as well.
I think what makes this a bit harder is that Archive also gets it's files expired after some days but I can do a max-age trick with that.
Unfortunately lfs is extremely slow compared to regular ls. On a bucket with ¬50k objects ls takes 40 seconds, lfs is like 10 minutes.
Hmm a bit of digging into that shows that lsf is reading the MimeType for each object even when the user didn't ask for it - that takes another transaction for an s3 backend.
I attempted to fix that here - can you have a go? This should make ls the same speed as lsf (provided you don't ask for the mime type).