S3 to tar on local system

What is the problem you are having with rclone?

I have a need to move hundreds of T in millions of files out of S3 to a local storage system. Each group of files needs to be tar'd or archiving purposes. Regular rclone commands work.

The archive storage isn't great at IOPS. Is there a way to go directly from S3 to Tar without first going to local storage? This would avoid extra steps and metadata ops on the storage. Tar of each folder in the bucket is the final destination.

Tried using rclone cat but didn't appear to do what I expected. Not sure if should expect this to work at all given rclone runs in parallel. Even if it did buffer whole files in memory files could be many GB in size and system woudln't have enough memory.

rclone cat  s3path:/ | tar -cvf /tmp/example.tar --files-from=-

What is your rclone version (output from rclone version)

v1.56.0-linux-amd64

Which OS you are using and how many bits (eg Windows 7, 64 bit)

CentOS 7.9 64bit

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone cat  s3path:/ | tar -cvf /tmp/example.tar --files-from=-

So what you want to do is tar up the files on S3 and save as a local .tar file.

rclone cat won't do what you want as it will concatenate all the files in the s3 bucket which almost certainly is undesirable.

You best bet is probably to use tar on an rclone mount.

I'd use these flags

rclone mount s3:bucket /mnt/path --use-server-modtime

The --use-server-modtime assumes you didn't upload the files with rclone so you don't need to preserve their modtimes. If you did upload with rclone then remove that flag.

You should then be able to use tar directly on /mnt/path.

PS I did imagine an rclone tar command which would do exactly this, but decided not to implement it since there is no easy way of making an rclone untar...

Thanks that's an interesting idea. We actually use rclone mounts for a few nightly backups with Bareos.

This would avoid the IOPS, but it would serialize access because tar would only do one file at a time. So likely would be slower than our rclone --transfers=32 copy s3:path/ /localpath/ followed by a tar off the 7200RPM drive array.

Thanks for the idea though, might try it. Maybe something to test because we could parallelize across the folders in the bucket. How does rclone mount handle multiple IO requests? Does it do any serializing of requests? Or could it still use --transfers to support parallel access to files form multiple tars running in parallel each on their own path in the bucket? We have over 100k paths to tar in the bucket each will be their own tar.

Very well :slight_smile:

There are various directory and file locks which can serialize things depending on exactly what you are doing. If you are operating on separate paths on the same mount, I think you'll be fine running multiple tars in parallel and should see increased performance.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.