How to speed up sync task taking 24 hours (AWS S3 => B2 Cloud)

What is the problem you are having with rclone?

Sync task takes 24 hours to run against a AWS S3 bucket to Backblaze B2 Cloud Storage. The source S3 bucket is 4TB large with lots and lots of small files (approx 16 million). Any additional flags or strategy to speed up the sync so it does not take 24 hours?

Running on a dedicated EC2 m5.large instance so 2 vCPUs and 8GB of memory so lot's of resources to throw at it.

What is your rclone version (output from rclone version)

rclone v1.56.1

  • os/version: ubuntu 18.04 (64 bit)
  • os/kernel: 5.4.0-1056-aws (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.16.8
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

AWS S3 and Backblaze B2 Cloud Storage

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --log-file=/var/log/rclone --log-level INFO sync --use-mmap --buffer-size=16M --checkers=128 --transfers 32 s3:acme-inc-cloud-storage crypt:

hello,

not sure your use case but one workaround is to run multiple commands.

  • per day - rclone copy s3:acme-inc-cloud-storage crypt: --max-age=24h
    per week or as needed, full sync.

and a look at the redacted config file would be helpful.

Thanks for the reply. Does using --max-age still need to scan every single file in the source AWS S3 bucket to check the date though?

Here is my config:

ubuntu@rclone1:~/.config/rclone$ cat rclone.conf
[b2]
type = b2
account = <hidden>
key = <hidden>
hard_delete = true

[s3]
type = s3
provider = AWS
env_auth = true
region = us-east-2
location_constraint = us-east-2
acl = private
server_side_encryption = AES256
storage_class = STANDARD

[crypt]
type = crypt
remote = b2:acme-inc-cloud-storage
filename_encryption = off
directory_name_encryption = false
password = <hidden>
password2 = <hidden>

yes, as far as i know, for any command, rclone has to check the source.
how else would rclone know what to copy or not to copy?

what is the concern about that, cost of api calls or what?

and have you tried the suggestions documented here

Well, mostly just the amount of time total, though costs are a little high. From Backblaze B2 Cloud Class C transactions are around $200 a month.

wasabi, a s3 clone known for hot storage, does not charge for

  • api calls
  • egress transfer

i noticed that for crypt:, you are not using filename/directory encryption, just data encryption.
wasabi supports SSE-C

using your current sync command:

  • what is the use-case for the command, backup or what?
  • how often do you run the command?
  • how many files are transferred?
  • what is the total size of the files transferred?

Check out the reducing costs section in the docs

https://rclone.org/s3/#reducing-costs

Are you using the native b2 protocol or using their s3 gateway?

For this case I'd use the s3 gateway because then they have compatible checksums (both MD5).

Assuing s3 -> s3 then use the --checksum flag - this will speed things up greatly. If you are using s3 -> b2 then use --size-only which isn't a perfect solution but will speed things up an equal amount.

Assuming you've got enough memory, then use --fast-list. This will buffer all the objects in memory first which will take quite a few GB of memory. That will speed things up too.