I'm using rclone to mount an S3 bucket to a Linux instance in AWS EC2 (to support an app that requires a POSIX mount), and specifically trying to read a large file (hundreds of GB). The app is a video encoder and is likely issuing a lot of relatively small read IOs.

I can get okay performance (approx 900 Mbps), but this is on a very big instance with a 25 Gbps network interface and a lot of cores. This amount of network traffic is only enough to keep about 8 vCPUs busy.

I think I'd like rclone to read ahead more aggressively via multi-part download (I've got 32 vCPU to use).

When I use AWS EFS (basically a managed NFS server) I can use almost all my vCPUs and sustain several Gbps, so I'd like to match that if possible. I know S3 can go this fast as well since a simple multi-part download using the AWS CLI can sustain several Gbps.

rclone v1.58.1

  • os/version: amazon 2 (64 bit)
  • os/kernel: 4.14.281-212.502.amzn2.x86_64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.17.9
  • go/linking: static
  • go/tags: none

rclone mount aws_s3:${bucket} /s3/rclone/${bucket} --no-modtime --read-only --vfs-cache-mode full --vfs-read-ahead 256Mi &

type = s3
provider = AWS
env_auth = true
region = ${REGION}
location_constraint = ${REGION}

about that app, to process a video

does the app download need to 100% of each video or what percentage of the file?

No, it streams the data in as it needs it. It starts the encoding process right away. I think it just has a single thread doing IO, and probably using relatively small chunks like 64 KB or something. I have no idea how to find out tho :slight_smile:

if i need to process a set of files in a rclone mount.
then i pre-load the files into the vfs file cache,

something like
rclone md5sum /s3/rclone/${bucket}/file.ext

Unfortunately I can't issue on-demand commands to the machine in question due to the control plane architecture. I can issue commands during boot-up (like, mounting the buckets I know I'll need), but once the system is running my only interaction with it is through a REST API. Long story.

Thanks for the suggestion tho :slight_smile:

ok, understood.

Run rclone listening to remote control commands only

Rclone implements a simple HTTP based protocol

I had a sponsorship deal to implement this, but unfortunately it fell through.

Maybe your company would be interested to pick it up?

