How to sync S3 with millions of files at root

darrepac · March 11, 2023, 5:04pm

What is the problem you are having with rclone?

Hi,

I have the same problem that it has been already reported several times... Yet may-be a workaround exist that I am not aware:
I am trying to sync a S3 bucket to a loacl SSD. Problem is that there are millions of files (about 4 Millions) in root remote directory. RAM is growing and about 1 hour after command launch, the process is killed.
What is the way to synchronize in such situation (ton of files in 1 directory)

thanks

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1

os/version: ubuntu 20.04 (64 bit)
os/kernel: 4.4.180+ (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.19.4
go/linking: static
go/tags: none

Which cloud storage system are you using? (eg Google Drive)

S3 (remote) - SSD (local)

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

sudo rclone sync --s3-disable-checksum --size-only minio:xxx/media/tracks /xxx/Tracks/

The rclone config contents with secrets removed.

Paste config here

A log from the command with the `-vv` flag

Paste  log here

asdffdsa · March 11, 2023, 5:20pm

hello and welcome to the forum,

this should not have issues with out of memory.

https://forum.rclone.org/t/recommendations-for-using-rclone-with-a-minio-10m-files/14472/4

rclone lsf -R source:bucket | sort > source-sorted
rclone lsf -R dest:bucket | sort > dest-sorted
comm -23 source-sorted dest-sorted > to-transfer
comm -12 source-sorted dest-sorted > to-delete
rclone copy --files-from to-transfer --no-traverse source:bucket dest:bucket
rclone delete --files-from to-delete --no-traverse dest:bucket

note: for testing,

add --dry-run to these two commands, rclone copy|delete --dry-run ...
use a debug log --log-level=DEBUG --log-file=/path/to/rclone.log

darrepac · March 11, 2023, 7:44pm

Very interesting, many thanks. Indeed such "manual" sync should do the trick.

Just a small remark

Should be
comm -13 source-sorted dest-sorted > to-delete

asdffdsa · March 11, 2023, 7:49pm

yes, i like that wording, sounds ok to me.

sorry, above my skill level....
tho, if you tweak the script, then please post it here, i can link to that.....

ncw · March 12, 2023, 3:15pm

Note that rclone will use roughly 1G of ram per million files, so you'll need 4G of RAM (maybe twice that) to sync 4 million files.

Otherwise it should work fine.

darrepac · March 13, 2023, 8:39am

Here is the corrected script

asdffdsa:

rclone lsf -R source:bucket | sort > source-sorted
rclone lsf -R dest:bucket | sort > dest-sorted
comm -23 source-sorted dest-sorted > to-transfer
comm -13 source-sorted dest-sorted > to-delete
rclone copy --files-from to-transfer --no-traverse source:bucket dest:bucket
rclone delete --files-from to-delete --no-traverse dest:bucket

darrepac · March 27, 2023, 9:38am

I think this command can be optimized because it is currently taking hours to "copy" less than 400 small files.
Isn't any flag that can be used to optimized it?

asdffdsa · March 27, 2023, 1:19pm

--- hard to be sure, without seeing the debug log.
--- perhaps split to-transfer into smaller files, run rclone copy against that.
--- perhaps do not use --no-traverse
"if you are copying a large number of files, especially if you are doing a copy where lots of the files under consideration haven't changed and won't need copying then you shouldn't use --no-traverse."

system · April 26, 2023, 1:19pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.