I'm using rclone to mount an S3 bucket to a Linux instance in AWS EC2 (to support an app that requires a POSIX mount), and specifically trying to read a large file (hundreds of GB). The app is a video encoder and is likely issuing a lot of relatively small read IOs.
I can get okay performance (approx 900 Mbps), but this is on a very big instance with a 25 Gbps network interface and a lot of cores. This amount of network traffic is only enough to keep about 8 vCPUs busy.
I think I'd like rclone to read ahead more aggressively via multi-part download (I've got 32 vCPU to use).
When I use AWS EFS (basically a managed NFS server) I can use almost all my vCPUs and sustain several Gbps, so I'd like to match that if possible. I know S3 can go this fast as well since a simple multi-part download using the AWS CLI can sustain several Gbps.
Run the command 'rclone version' and share the full output of the command.
rclone v1.58.1
os/version: amazon 2 (64 bit)
os/kernel: 4.14.281-212.502.amzn2.x86_64 (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.17.9
go/linking: static
go/tags: none
Which cloud storage system are you using? (eg Google Drive)
AWS S3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
rclone mount aws_s3:${bucket} /s3/rclone/${bucket} --no-modtime --read-only --vfs-cache-mode full --vfs-read-ahead 256Mi &
The rclone config contents with secrets removed.
[aws_s3]
type = s3
provider = AWS
env_auth = true
region = ${REGION}
location_constraint = ${REGION}
No, it streams the data in as it needs it. It starts the encoding process right away. I think it just has a single thread doing IO, and probably using relatively small chunks like 64 KB or something. I have no idea how to find out tho
Unfortunately I can't issue on-demand commands to the machine in question due to the control plane architecture. I can issue commands during boot-up (like, mounting the buckets I know I'll need), but once the system is running my only interaction with it is through a REST API. Long story.