Syncing S3 bucket with 4.5 TB and 15 M files crashing OOM listing

What is the problem you are having with rclone?

I am attempting to sync a S3 bucket with 4.5 TB and 15 M files to Backblaze B2 cloud. Rclone is crashing with OOM while trying to list the files. I am running this on a dedicated AWS EC2 instance t3.large. I can scale the instance size, but wondering if perhaps I should be looking at tweaking some rclone performance flags.

What is your rclone version (output from rclone version)

rclone v1.56.0
- os/version: ubuntu 18.04 (64 bit)
- os/kernel: 5.4.0-1054-aws (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.16.5
- go/linking: static
- go/tags: none

Which OS you are using and how many bits (eg Windows 7, 64 bit)

ubuntu 18.04 (64 bit)

Which cloud storage system are you using? (eg Google Drive)

Sync from AWS S3 to Backblaze B2 cloud.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --fast-list --progress --transfers 32 s3:acme-bucket crypt:

Missing rclone.conf
No log file.

Sorry @Animosity022, below is rclone.conf:

type = b2
account = <not-shown-here>
key = <not-shown-here>
hard_delete = true

type = s3
provider = AWS
env_auth = true
region = us-east-2
location_constraint = us-east-2
acl = private
server_side_encryption = AES256
storage_class = STANDARD

type = crypt
remote = b2:<not-shown-here>
filename_encryption = off
directory_name_encryption = false
password = <not-shown-here>
password2 = <not-shown-here>

I am going to try without --fast-list. Just read the documentation on it and important parts:

  • It will use fewer transactions (important if you pay for them)
  • It will use more memory. Rclone has to load the whole listing into memory.
  • It may be faster because it uses fewer transactions
  • It may be slower because it can't be parallelized

If you pay for transactions and can fit your entire sync listing into memory then --fast-list is recommended. If you have a very big sync to do then don't use --fast-list otherwise you will run out of memory.

Also, using the flag --use-mmap seems like a good candidate for me to enable returning memory back to the OS instead of using the internal Go memory allocator.

Removing this will help, unless you have millions of objects in a single directory.

@ncw are they any flags I should be looking up to speed up the second sync run after the huge first initial sync completed? Trying to enable --fast-list still crashes after around 15 minutes using 8+ GB of memory so I removed --fast-list.

I already have --checkers=64 but still seeing the checking taking a very long time.

rclone sync --use-mmap --buffer-size=16M --checkers=64 --progress --transfers 64 s3:acme-bucket crypt:

If you were syncing disk -> cloud you could do a "top up sync" with rclone copy --max-age However this won't work very well cloud -> cloud as it will still have to read all the listings at both ends.

You could do a manual process where you use rclone to list the source and destinations to a file with rclone lsf, then sort the files and use a tool like comm to weed out identical entries. You can then use that info to figure out what needs to be transferred / deleted.

That isn't a particularly easy process though - it would be nice if rclone had an "low memory" sync option where it did exactly that - wrote the files to disk, sorted them and compared them.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.