Can I download from S3 using a file list that spans MANY object folders in a bucket?

Hey everyone! I’m running into a problem set that I just can’t get around. Truth in advertising, I’m brand new to rclone, so it could just be a new user mistake.

I’m trying to use rclone with S3. I have access to a bucket where content is just synced from another location. To be brief, here is how the bucket is laid out (in essence):

bucket/folder1/./././
Bucket/folder2/./././

Rinse and repeat that type of folder for ~40k folders. Now, I pulled a text list of the full S3 URI for the 2500 files I care about.

Now here’s where my confusion comes. I built a python script to iterate over the text file and it downloads 100%, but one line at a time. Is there a way with rclone to thread that out?

I hope this makes sense. I’ll gladly clarify! Thank you!

hello and welcome to the forum,

yes, rclone can read from a text file
https://rclone.org/filtering/#files-from-read-list-of-source-file-names
https://rclone.org/filtering/#files-from-raw-read-list-of-source-file-names-without-any-processing

keep in mind that for that text file, the files locations are relative to the root of the rclone remote you create.

I just wanted to thank you! I guess my patience is what was the failure...I did a dry run and it took 7min 19 seconds to even start the dry run. I guess sifting through a 500TB bucket takes a hot minute.

imho, take a hot minute to read the documenation,
rclone calculates the md5 checksum of each file before upload.
https://rclone.org/s3/#hashes

and tweak --checkers, --transfers and https://rclone.org/s3/#multipart-uploads

Also check out s3 performance tuning here

https://rclone.org/s3/#reducing-costs

Thanks for this information this also help me.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.