Can I download from S3 using a file list that spans MANY object folders in a bucket?

Jared_Alvarado · March 29, 2021, 7:19pm

Hey everyone! I’m running into a problem set that I just can’t get around. Truth in advertising, I’m brand new to rclone, so it could just be a new user mistake.

I’m trying to use rclone with S3. I have access to a bucket where content is just synced from another location. To be brief, here is how the bucket is laid out (in essence):

bucket/folder1/./././
Bucket/folder2/./././

Rinse and repeat that type of folder for ~40k folders. Now, I pulled a text list of the full S3 URI for the 2500 files I care about.

Now here’s where my confusion comes. I built a python script to iterate over the text file and it downloads 100%, but one line at a time. Is there a way with rclone to thread that out?

I hope this makes sense. I’ll gladly clarify! Thank you!

asdffdsa · March 29, 2021, 7:39pm

hello and welcome to the forum,

yes, rclone can read from a text file
https://rclone.org/filtering/#files-from-read-list-of-source-file-names
https://rclone.org/filtering/#files-from-raw-read-list-of-source-file-names-without-any-processing

keep in mind that for that text file, the files locations are relative to the root of the rclone remote you create.

Jared_Alvarado · March 29, 2021, 8:24pm

I just wanted to thank you! I guess my patience is what was the failure...I did a dry run and it took 7min 19 seconds to even start the dry run. I guess sifting through a 500TB bucket takes a hot minute.

asdffdsa · March 29, 2021, 8:30pm

imho, take a hot minute to read the documenation,
rclone calculates the md5 checksum of each file before upload.
https://rclone.org/s3/#hashes

and tweak --checkers, --transfers and https://rclone.org/s3/#multipart-uploads

ncw · March 30, 2021, 4:03pm

Also check out s3 performance tuning here

https://rclone.org/s3/#reducing-costs

Habana_vee · April 5, 2021, 2:48pm

Thanks for this information this also help me.

system · June 5, 2021, 10:49am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.