Memory usage S3-alike to Glacier without big directories

What is the problem you are having with rclone?

It's using too much memory - then Linux kills it (around 16 GB, what the server has)

Note that the maximum number of files in a single directory is 70000.
This is for a directory like:
dir1/dir2/dir3 (with 70000 files here)

Obviously dir1/dir2 contains more files because there are directories like:
dir1/dir2/dir3 (70000 files)
dir1/dir2/dir3-2 (with 50000 files or whatever)

I've read that rclone reads all the files of a directory, I'm assuming that in a flat not with its subdirectories (or "/" would be problematic...)

Also, I've tried with --buffer-size 100M but I think that this is for VFS not for copying from bucket to bucket (same with other --vfs options) (I've also tried --buffer-size 16 as well, same problem)

Some files might be big but I guess that rclone is not reading the full files in memory before sending them to the destination bucket. No fuse is involved here.

I'm going to try now with --s3-upload-concurrency and see what happens - but regardless of this, any other suggestion? I'll update the post in half an hour (it usually fails in 10 min with the current set of files)

What is your rclone version (output from rclone version)

rclone v1.50.2

  • os/arch: linux/amd64
  • go version: go1.13.4

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Red Hat Enterprise Linux Server release 7.7 (Maipo)
64 bit

Which cloud storage system are you using? (eg Google Drive)

Reading from an S3 alike (Scality) copying to Amazon S3 with DEEP_ARCHIVE backend

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy --bwlimit 20M --config a_config_file --verbose bucket1-readonly://the_bucket_name aws-readwrite:data/

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

Full log below.

The output says:
fatal error: runtime: out of memory

Not now but earlier when I had this running I've seen in the logs:
Dec 27 00:02:19 servername kernel: Killed process 57456 (rclone), UID 1000, total-vm:43916380kB, anon-rss:14995024kB, file-rss:0kB, shmem-rss:0kB

Log:
Transferred: 9.300G / 665.502 GBytes, 1%, 19.849 MBytes/s, ETA 9h24m12s
Errors: 0
Checks: 3349 / 3349, 100%
Transferred: 91 / 10103, 1%
Elapsed time: 7m59.7s
Transferring:

  • data_admin/servers/ace…13-20-10-roundcube.sql: 80% /6.492M, 325.658k/s, 3s
  • data_admin/servers/ace…2-11-20-10-ace2016.sql:100% /1.781G, 2.175M/s, 0s
  • data_admin/servers/ace…2-12-20-10-ace2016.sql: 45% /1.801G, 9.383M/s, 1m48s
  • data_admin/servers/ace…2-13-20-10-ace2016.sql: 14% /1.819G, 7.940M/s, 3m20s

2019/12/27 22:55:34 DEBUG : data_admin/servers/ace-intranet/2017-02-13-20-10-roundcube.sql: MD5 = 7004d8342897e10350c966c0c01e4e67 OK
2019/12/27 22:55:34 INFO : data_admin/servers/ace-intranet/2017-02-13-20-10-roundcube.sql: Copied (new)
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x168c51d, 0x16)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/panic.go:774 +0x72
runtime.sysMap(0xc78c000000, 0x140000000, 0x2410438)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x23f76a0, 0x140000000, 0xc000000001, 0x3004372a7)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/malloc.go:701 +0x1cd
runtime.(*mheap).grow(0x23f76a0, 0xa0000, 0xffffffff)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/mheap.go:1255 +0xa3
runtime.(*mheap).allocSpanLocked(0x23f76a0, 0xa0000, 0x2410448, 0x1c82ea0)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/mheap.go:1170 +0x266
runtime.(*mheap).alloc_m(0x23f76a0, 0xa0000, 0x101, 0xc000103f20)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/mheap.go:1022 +0xc2
runtime.(*mheap).alloc.func1()
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/mheap.go:1093 +0x4c
runtime.(*mheap).alloc(0x23f76a0, 0xa0000, 0xc002010101, 0xc000150a80)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/mheap.go:1092 +0x8a
runtime.largeAlloc(0x140000000, 0x101, 0xc0001a7500)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/malloc.go:1138 +0x97
runtime.mallocgc.func1()

    /opt/hostedtoolcache/go/1.13.4/x64/src/runtime/malloc.go:1033 +0x46

runtime.systemstack(0x0)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/proc.go:1146

goroutine 33 [running]:
runtime.systemstack_switch()
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/asm_amd64.s:330 fp=0xc00093b210 sp=0xc00093b208 pc=0x45d150
runtime.mallocgc(0x140000000, 0x138e9c0, 0xc0000c9501, 0x0)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/malloc.go:1032 +0x895 fp=0xc00093b2b0 sp=0xc00093b210 pc=0x40c8e5
runtime.makeslice(0x138e9c0, 0x140000000, 0x140000000, 0xc0000c9500)
/opt/hostedtoolcache/go/1.13.4/x64/src/runtime/slice.go:49 +0x6c fp=0xc00093b2e0 sp=0xc00093b2b0 pc=0x445ccc
github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).init.func1(0xc002bcf650, 0x0)
/home/runner/work/rclone/src/github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager/upload.go:400 +0x44 fp=0xc00093b318 sp=0xc00093b2e0 pc=0x1059724
sync.(*Pool).Get(0xc002bcf650, 0x19485c0, 0xc000494a90)
/opt/hostedtoolcache/go/1.13.4/x64/src/sync/pool.go:148 +0xa6 fp=0xc00093b360 sp=0xc00093b318 pc=0x46e856
github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).nextReader(0xc002bcf5e0, 0x0, 0x0, 0x8, 0x8, 0xc001de80f8, 0x10, 0x18, 0xc00006bc00)
/home/runner/work/rclone/src/github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager/upload.go:461 +0x66 fp=0xc00093b420 sp=0xc00093b360 pc=0x1057636

Quick update: using --s3-upload-concurrency 1 is not fixing the error (out of memory error as described in the original post)

I've also tried with the options:
rclone copy --s3-upload-concurrency 1 --buffer-size 0 --bwlimit 20M

no luck.

I'll try a memory profiler, either now or tomorrow...

At the moment the output of:
go tool pprof -text http://localhost:5572/debug/pprof/heap

is:
File: rclone
Type: inuse_space
Time: Dec 28, 2019 at 1:10am (UTC)
Showing nodes accounting for 5GB, 99.61% of 5.02GB total
Dropped 27 nodes (cum <= 0.03GB)
flat flat% sum% cum cum%
5GB 99.61% 99.61% 5GB 99.61% github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).init.func1
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/backend/s3.(*Fs).Put
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/backend/s3.(*Object).Update
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/backend/s3.(*Object).Update.func2
0 0% 99.61% 5.01GB 99.79% github.com/rclone/rclone/fs.pacerInvoker
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/fs/operations.Copy
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/fs/sync.(*syncCopyMove).pairCopyOrMove
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/lib/pacer.(*Pacer).CallNoRetry
0 0% 99.61% 5.01GB 99.79% github.com/rclone/rclone/lib/pacer.(*Pacer).call
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).nextReader
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).upload
0 0% 99.61% 5GB 99.61% github.com/rclone/rclone/vendor/github.com/aws/aws-sdk-go/service/s3/s3manager.Uploader.UploadWithContext
0 0% 99.61% 5GB 99.61% sync.(*Pool).Get

And top shows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
58798 admin 20 0 21,2g 13,5g 5220 S 39,0 87,3 2:43.48 rclone

rclone just crashed again. It was using 3,7g of RES and then it spiked to 13,5 g. Then it crashed (probably when it reached 16 g)

This was with:
rclone --rc copy --s3-upload-concurrency 1 --buffer-size 0 --bwlimit

Thanks!

This is the AWS SDK upload manager and it looks like this problem

Are you trying to transfer very big files? If so how big are they?

Setting --transfers 1 should help.

Hi ncw,

I'm trying now with --transfers 1.
It has uploaded a series of files and now is uploading a file of 4.2G and using already lots of memory:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
  9048 admin     20   0   15,9g  13,9g   4728 S   8,3 89,9   4:07.14 rclone     

I have a list of files to be uploaded with its size and I see that the bigger file is 193 GB, then some 100 GB+, then way smaller.

The 4.2 GB file has finished uploading. Now is on a 116 GB file, let's see what happens, it's using 90% of the memory.

The current command is:
rclone --rc copy --transfers 1 --s3-upload-concurrency 1 --buffer-size 0 --bwlimit

And the origin and destination (S3 alike to S3 Glacier, different buckets)

Thanks for any help, ideas, etc.

And it has just been killed:

fatal error: runtime: out of memory

Uploading the 116 GB file.

As an idea if nothing else appears: I could use the --max-size and upload files up to 4 GB with --transfers 1... then upload the other files using some other tool and verify.

But I hope that you might have some other ideas... :slight_smile: ?

Thanks very very much! (we use rclone extensively with great success!)

Carles

I did:
rclone --max-size 500M --transfers 1 --bwlimit 20M ...

After a while I got a memory error while it was copying a 200 MB file:
fatal error: runtime: out of memory

The full information: https://pastebin.com/dk0gxgjm

As always, ideas, workarounds, etc. are appreciated :slight_smile:

Carles

I've just been trawling through bug reports in the AWS s3 sdk...

This one looks extremely relevant

I've pulled that fix into the latest beta - can you have a go with that?

I'll try - do you confirm that the file:
https://beta.rclone.org/rclone-beta-latest-linux-amd64.zip - 28/12/2019, 14:34:25

has the fix?
In case that I need to wait for a new beta to be created...

Thanks very much!

Carles

No need to answer, I see a new build from today: 29/12/2019, 17:54:53 . I'll try with this one :slight_smile:

No luck :frowning:

I used:

rclone v1.50.2-094-g207474ab-beta
os/arch: linux/amd64
go version: go1.13.5

It ends with:

fatal error: runtime: out of memory

I used:

rclone.beta --transfers 1 --bwlimit 20M $orgin $destination

See the output in https://pastebin.com/D8rVQTY0

I'm happy to try other things... :slight_smile:

I pulled the latest s3 sdk in upgrading it from to v1.26.8 from v1.25.31

Maybe this will have a fix in - I'm not particularly hopeful but it is worth a try if you don't mind!

https://beta.rclone.org/branch/v1.50.2-095-ge332d3e0-fix-s3-memory-beta/ (uploaded in 15-30 mins)

Can you describe the files you are copying? I'd like to get an idea of the size distribution and numbers of files to see if I can reproduce locally.

One other thing you could try is

export GOGC=20

this tells the go garbage collector to work harder at the cost of more CPU.

Hi,
Oops, I think that I missed the email notification last night for your latest message. Trying the fix-s3-memory-beta now with export GOGC=20

If it fails I'll provide a description of the files that I'm copying. Would you prefer the copied ones? Or the ones to be copied?

I'm using --transfer 1 (it's easier to see what's happening)

Oh, I'm looking at this in real time. It copied a few small files (KB or maximum MB) and rclone memory consumption was stable. I was looking at %MEM on top of a 16 GB system. It was less than 3%.

Then it started copying a 2 GB file and the memory consumption when to 15% then 19%. When it finished copying: the memory consumption never went down.

Then it copied a few small files (MBs): no much change in the memory consumption that I could see.

Then it started copying a 800 MB file: memory consumption seemed to peak from 19% to 80%? (don't quote on me). It's on 86%, never releasing memory it seems (or not big amounts). It's now copying tens of files of 25 MB, 90.6% increasing to 91.8%, etc.

Oops, now a video, 325 MB, it's stable on 92%, not going much higher or lower. But I'm pretty sure that the next big file it will run out of memory - actually it's already swapping.

Basically, with all the time, I hardly see the memory usage to decrease and in the GB alike files the memory usage grows very fast and never decrease. As said, size of files at the moment is from few GB to tens of MB.

I have a list of all the files, I'm looking at the screen to know what's being copied not the total distribution.

Thanks ever so much for looking at this!

BTW, and this is out of my curiosity just to understand it better (and maybe start thinking of how to copy the data from a bucket to another place). I have, for at least 24 hours, another two rclones copying data (these rclones are older: v1.49.5, the Debian packaged ones) copying files from a local file system to an S3. The files are around 2 MB each file, rclones have done many files (in total 1.31 TB but I know that I stopped the rclones after starting the traffic counter).

These ones don't have memory problems.

Is the memory problem only happening on big (multi-part) files? (I guess so) Or Glacier? (I guess not) or S3 to S3? I'm also thinking how to do the copy. Not in a super-urgent state but I'd like to be done some time soon. In the past, before I knew rclone, I had some scripts that would help me doing this but would really like to avoid using those! I'm spoiled by rclone now (and that ones were for local to S3, not S3 to S3...).

Thank you,

Can you share your config file (with the sensitive parts XXX-ed out) and the the command you are using please?

From the memory dump above it looks like you aren't using a server side copy

Also a log with -vv from the copy would be helpful too.

The backtraces when it runs out of memory aren't particularly useful so I don't need any more of those.

I think the memory problem is only for large files - files larger than --s3-upload-cutoff. You can raise this limit to workaround the problem - the maximum is 5G. Rclone uploads all files below this limit as a single part upload so avoiding it for most files should help.

Config file:

[destination-bucket-readwrite]
type = s3
provider = AWS
env_auth = false
access_key_id = xxxxx
secret_access_key = xxxxxx
region = eu-north-1
endpoint =
location_constraint = eu-north-1
acl = private
server_side_encryption =
storage_class = DEEP_ARCHIVE
chunk_size = 5G

[origin-bucket-readonly]
type = s3
env_auth = false
access_key_id = xxxxxx
secret_access_key = xxxxxx
region =
endpoint = https://xxxxxxx # This is an internal Scality object storage based on the organisation that I'm working for
location_constraint =
acl = private
server_side_encryption =
storage_class =

Command line:

rclone.beta --transfers 1 --bwlimit 20M --config path_to_config -vv copy spibucket1-readonly://bucket_id destination-bucket-readwrite:bucket_id/

(I noticed that there is "://" for one bucket and a bit different format for the other bucket. The only reason is that the colleague data manager wrote the original commands and I'm investigating the problems... I guess that the "/" are not the problem :slight_smile: )

For the log: it's complicated because it contains file names. I can write a script to remove file names if you want an exhaustive log. For now I'm doing a bit of copy-paste/remove:
There are many lines with files like this:
2019/12/30 12:53:22 DEBUG : file_name.extension: Size and modification time the same (differ by 0s, within tolerance 1ns)
2019/12/30 12:53:22 DEBUG : file_name.extension: Unchanged skipping

Every now and then:
2019/12/30 12:53:37 DEBUG : directory/20170118_12/something: MD5 = the_md5 OK
2019/12/30 12:53:37 INFO : directory/20170118_12/something: Copied (new)

I have about 62000 lines like this.

Then:
Transferred: 1.531G / 2.064 TBytes, 0%, 6.536 MBytes/s, ETA 3d19h55m25s
Checks: 30682 / 30682, 100%
Transferred: 689 / 10698, 6%
Elapsed time: 3m59.8s
Transferring:

  • a file.csv:100% /6.546M, 0/s, -

Some small sizes files...

Then I'm copy-pasting the final bit of rclone:

2019/12/30 12:58:01 INFO  : media/file1.MP4: Copied (new)
2019/12/30 12:58:21 INFO  : media/file1.MP4: Copied (new)
2019/12/30 12:58:22 INFO  : 
Transferred:   	    2.334G / 2.065 TBytes, 0%, 7.967 MBytes/s, ETA 3d3h24m7s
Checks:             30682 / 30682, 100%
Transferred:          704 / 10713, 7%
Elapsed time:     4m59.9s
Transferring:
 *       file2.MP4:  0% /576.242M, 0/s, -

2019/12/30 12:59:02 INFO  : file2.MP4: Copied (new)
2019/12/30 12:59:22 INFO  : 
Transferred:   	    3.259G / 2.065 TBytes, 0%, 9.274 MBytes/s, ETA 2d16h46m15s
Checks:             30682 / 30682, 100%
Transferred:          705 / 10714, 7%
Elapsed time:     5m59.8s
Transferring:
 *       file2.MP4: 15% /2.348G, 19.173M/s, 1m46s

2019/12/30 13:00:22 INFO  : 
Transferred:   	    4.394G / 2.065 TBytes, 0%, 10.719 MBytes/s, ETA 2d8h35s
Checks:             30682 / 30682, 100%
Transferred:          705 / 10714, 7%
Elapsed time:     6m59.8s
Transferring:
 *       file2.MP4: 63% /2.348G, 18.394M/s, 47s

2019/12/30 13:01:22 INFO  : 
Transferred:   	    5.244G / 2.065 TBytes, 0%, 11.191 MBytes/s, ETA 2d5h37m25s
Checks:             30682 / 30682, 100%
Transferred:          705 / 10714, 7%
Elapsed time:     7m59.8s
Transferring:
 *       file2.MP4:100% /2.348G, 7.136M/s, 0s

2019/12/30 13:01:52 INFO  : file2.MP4: Copied (new)
2019/12/30 13:02:17 INFO  : file3.MP4: Copied (new)
2019/12/30 13:02:22 INFO  : 
Transferred:   	    5.620G / 2.066 TBytes, 0%, 10.660 MBytes/s, ETA 2d8h18m1s
Checks:             30682 / 30682, 100%
Transferred:          707 / 10716, 7%
Elapsed time:     8m59.8s
Transferring:
 *       file3.MP4: 59% /146.130M, 20.959M/s, 2s

2019/12/30 13:02:25 DEBUG : file4.MP4: MD5 = 6e85121d42ac11609d9e2d99da13d83d OK
2019/12/30 13:02:25 INFO  : file4.MP4: Copied (new)
2019/12/30 13:02:25 DEBUG : file5.MP4: MD5 = 916e47ae01782b5007fd5493232db6d2 OK
2019/12/30 13:02:25 INFO  : file6.MP4: Copied (new)
2019/12/30 13:03:00 INFO  : file7.MP4: Copied (new)
fatal error: runtime: out of memory

I'm going to start with --s3-upload-cutoff 5G (and also --max-size 5G, if I can copy everything but the big files is a start at least)

I'm not familiar at all on the bucket to bucket transfers but being two different accounts in two different providers I didn't expect to do a bucket to bucket transfer. This is what I meant when i said "Reading from an S3 alike (Scality) copying to Amazon S3 with DEEP_ARCHIVE backend" (should have said writing to or uploading to...).

Thanks ever so much!

Thanks for posting that. I just wanted to check how rclone was transferring the files betwen the two s3 instances. It is streaming the data from one to the other so effectively it should be the same as downloading to a local disk drive then uploading again, just without the disk usage.

That points the finger at the s3manager multipart upload code..

I'm going to noodle about with re-writing it which should fix this problem and others!

Hi Nick,

I'm happy to test anything that you do... like re-implementing s3manager :slight_smile:

At the moment I'm copying with --s3-upload-cutoff 5G --max-size 5G a bucket containing around 4 million files. There are 292 files bigger than 5 GB :slight_smile: so I'm happy to give a try on any solution that you come up with. If timing doesn't work out for this time I'll use aws cp tool (hopefully it doesn't need to save the file into disk, else I'll need some more work to have enough space for this) to copy these 292 files (not ideal, not the end of the world)

Thanks ever so much for all of this (and the rest of rclone!)

OK I've had a go at re-implementing s3manager. I just adapted the code which does the multipart server side copies.

It passes my smoke tests and passes the s3 integration tests so should be good to go. If you are feeling paranoid it might be worth uploading a big file or two then downloading it again to check it is intact (I've done this already but a double check is always nice!).

This should (:crossed_fingers:) behave sensibly with memory! It should use exactly --s3-concurrency * --s3-chunk-size * --transfers memory and not go bonkers...

https://beta.rclone.org/branch/v1.50.2-095-g43f26dac-fix-s3-manager-beta/ (uploaded in 15-30 mins)