Read ahead on S3

This is related to a post I made awhile back

What is the problem you are having with rclone?

I'm using rclone to mount an S3 bucket to a Linux instance in AWS EC2 (to support an app that requires a POSIX mount), and specifically trying to read a large file (hundreds of GB). The app is a video encoder and is likely issuing a lot of relatively small read IOs.

I can get okay performance (approx 900 Mbps), but this is on a very big instance with a 25 Gbps network interface and a lot of cores. This amount of network traffic is only enough to keep about 8 vCPUs busy.

I think I'd like rclone to read ahead more aggressively via multi-part download (I've got 32 vCPU to use).

When I use AWS EFS (basically a managed NFS server) I can use almost all my vCPUs and sustain several Gbps, so I'd like to match that if possible. I know S3 can go this fast as well since a simple multi-part download using the AWS CLI can sustain several Gbps.

Run the command 'rclone version' and share the full output of the command.

rclone v1.58.1

  • os/version: amazon 2 (64 bit)
  • os/kernel: 4.14.281-212.502.amzn2.x86_64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.17.9
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount aws_s3:${bucket} /s3/rclone/${bucket} --no-modtime --read-only --vfs-cache-mode full --vfs-read-ahead 256Mi &

The rclone config contents with secrets removed.

[aws_s3]
type = s3
provider = AWS
env_auth = true
region = ${REGION}
location_constraint = ${REGION}

A log from the command with the -vv flag

Not sure which logs would be useful

hi,

about that app, to process a video

does the app download need to 100% of each video or what percentage of the file?

No, it streams the data in as it needs it. It starts the encoding process right away. I think it just has a single thread doing IO, and probably using relatively small chunks like 64 KB or something. I have no idea how to find out tho :slight_smile:

sometimes,
if i need to process a set of files in a rclone mount.
then i pre-load the files into the vfs file cache,

something like
rclone md5sum /s3/rclone/${bucket}/file.ext

Unfortunately I can't issue on-demand commands to the machine in question due to the control plane architecture. I can issue commands during boot-up (like, mounting the buckets I know I'll need), but once the system is running my only interaction with it is through a REST API. Long story.

Thanks for the suggestion tho :slight_smile:

ok, understood.

Run rclone listening to remote control commands only

Rclone implements a simple HTTP based protocol

edit: i just realized that you were the one that started that other topic
i should have read that topic in more detail before i posted.....

I had a sponsorship deal to implement this, but unfortunately it fell through.

Maybe your company would be interested to pick it up?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.