Unable to transfer the data due to high memory usage, Getting killed

STOP and READ USE THIS TEMPLATE NO EXCEPTIONS - By not using this, you waste your time, our time and really hate puppies. Please remove these two lines and that will confirm you have read them.

What is the problem you are having with rclone?

We are about to transfer 40M files consuming 6TB of data from swift to s3. The rclone VM has 8 vcpu and 16GB RAM. while transferring the data we have observed that there is increase in memory usage and it automatically gets killed since memory is completely consumed.
I have used fast-list and that's terrible bad for my use-case. And then I have used --use-mmap command, now the memory is slowly increasing not rapidly like earlier but soon to be consumed. Are there any flags which i should include to get this going without getting killed due to memory issue..(Psst.. its almost 15 times now)

Run the command 'rclone version' and share the full output of the command.

rclone v1.64.0

  • os/version: centos 7.9.2009 (64 bit)
  • os/kernel: 3.10.0-1160.45.1.el7.x86_64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.21.1
  • go/linking: static
  • go/tags: none.

Are you on the latest version of rclone? You can validate by checking the version listed here: Rclone downloads
--> Yes

Which cloud storage system are you using? (eg Google Drive)

s3 middleware (swift) and s3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Paste command here

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[swift]
type = s3
provider = Other
access_key_id = XXX
secret_access_key = XXX
endpoint = swift.com

[s3]
type = s3
provider = Other
access_key_id = XXX
secret_access_key = XXX
endpoint = http://s3.com


A log from the command that you were trying to run with the -vv flag

2024/05/23 19:48:40 DEBUG : rclone: Version "v1.64.0" starting with parameters ["./rclone" "--progress" "--log-file=copy15.txt" "copy" "swift:std" "s3:std" "--transfers=20" "--checkers=100" "--use-mmap" "-vv"]
2024/05/23 19:48:40 DEBUG : Creating backend with remote "swift:std"
2024/05/23 19:48:40 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2024/05/23 19:48:40 DEBUG : Resolving service "s3" region "us-east-1"
2024/05/23 19:48:40 DEBUG : Creating backend with remote "s3:std"
2024/05/23 19:48:40 DEBUG : Resolving service "s3" region "us-east-1"

It won't fly... rclone uses about 1KB of RAM for every listed object. For 40M files you need about 40GB of memory.

Hi.. How about rclone rc vfs/forget command. Would this help? Not sure how to use this inconjunction with copy flag

You are not using VFS so there is nothing to forget.

If you want to deal with 10s of millions of objects you have to scale your system accordingly.

ok. Woah.. I just noticed that it checked 5 M and consumed 14G of memory almost. I may need 8 times of it and i may never success. If anyone has overcome this situation, pls do help me.

@ncw eager to hear from you with advice

one option is to split the list of files in parts and run rclone multiple times, using filters.

another option that has been used a number of times.
How to sync S3 with millions of files at root - #2 by asdffdsa
Rclone sync S3 to S3 runs for hours and copy nothing - #23 by ncw

1 Like

was thinking the same, as asdffdsa is proposing. Depending on your data structure, divide it into multiple runs to keep the file count manageable. You are talking about a one time data transfer, so you just need to divide it into usable buckets and just compare at the end. You could just do a lsf at the end of both backends and do a diff between them.

Thank you @asdffdsa. Let me check further..

Another query, i have observed in the swift the storage space occupied is 5.7 TB (although i have used the s3 interface), but while i moved to s3, I am seeing that the space occupied as 16TB... I am not sure what could be the reason for this..

I have used the command as below..

./rclone --progress --log-file=copy6.txt copy tnsswift:tns_std tns:tns-std --transfers=20 --checkers=100 -vv

Does the above command have copied duplicates and so on.. ? If yes, any other flag to ensure that the source files are copied to destination with checksum checked and delete the duplicates?

might want to update rclone and test again.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.