When copying a large local file (600+gb) to amazon s3 the rclone process uses a lot of memory, 2gb+. This makes it impossible to have multiple transfers going without hitting the OOM killer (at 3gb).
What is your rclone version (output from rclone version)
rclone v1.50.2
os/arch: freebsd/386
go version: go1.13.4
Which OS you are using and how many bits (eg Windows 7, 64 bit)
FreeBSD 11.1-STABLE, 64-bit
Which cloud storage system are you using? (eg Google Drive)
Amazon S3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
That is the s3 uploader's buffers... Which looks a bit on the large size.
The concurrency by default is
--s3-upload-concurrency int Concurrency for multipart uploads. (default 4)
I think the chunk sizes will be quite large though to fit inside the 10,000 limit. I see in your log a 3.368TB file. That would need a chunk size of 3.368TB/10000 = 336.8MB and with a concurrency of 4 you'll need 4 of those so 1347MB.
I think this starts to explain where the memory usage is coming from. I'm not sure why exactly the s3 uploader uses such a lot of memory though.
To use less memory set --s3-concurrency lower.
Or if you want to use the least memory then you can set --s3-upload-cutoff 100T which will upload the files in a single part and not use any in memory buffers. This will work provided they are < 5TB and will use very little extra memory. However any failures will mean the whole thing is retried rather than just the chunk.
Well, there's no one file of 3TB, it's that there're 5 files of 600-800gb, so it shouldn't be allocating an upload buffer for all the files when it's uploading one at a time. Right?
I also thought --s3-upload-cutoff had a max of 5gb? At least, that's what https://rclone.org/s3/ says. And that's what my version of rclone says, too - it bombed out when I tried 100T.
Yeah, I can get it to fit under the memory limit with setting concurrency to 1, but it's rather slow (8mb/s).
It doesn't look like a memory leak - while the memory can rise over time, it's very slow, and it can also go down, so I think it's more likely just standard garbage collection activity.
Thanks for your help, but I think we'll probably just use a different client. AWS has a cli that isn't quite as nice as rclone, but it looks like it'll have much better transfer rates in this environment.
topped out (very briefly, likely just before garbage collection kicked in) at 2g, transferring files as large as 648 gb, and generally ran at 1gb or less. These were the same files as before, just from a remote machine over nfs instead of on the file server.
So the apparent big memory object seems to be using the same amount of memory in both cases, roughly, despite having s3--upload-concurrency of 16 vs the default of 4. Which means the extra 2gb the freebsd case was using both isn't tracked by the go memory trace and isn't present in the linux version, whatever it is.
After multiple complaints about s3 using too much memory for multipart uploads I've re-written the uploader. I was using the one from the AWS SDK but there are loads of reports of it using lots of memory.
Can you have a go with this which should hopefully fix the problem.
Seems to be working. I'll have more details later. Unfortunately, I don't have any more 600gb+ files to test with (I transferred them all already using the linux machine), but I'm trying with 4 200gb+ files at once, and it's under 1gb of total memory usage for now.
Memory usage has grown over time, but is still quite reasonable. Here's a trace over time of RES for the process as reported by top (size followed by minutes at that size):
It's a little concerning that's it's only increasing over time, though, as the sizes of the files being uploaded have been fairly consistent over the run (roughly 225gb each, uploading four at a time). On the other hand, it does seem to have mostly settled down, and it's probably just garbage collection accumulation (at around 2x the starting size).
Here's the go memory profile (I ran it earlier in the run as well, and results were about the same):
root@freenas5:~ # ./go/bin/go tool pprof -text http://localhost:5572/debug/pprof/heap
Fetching profile over HTTP from http://localhost:5572/debug/pprof/heap
Saved profile in /root/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Type: inuse_space
Time: Jan 3, 2020 at 11:40am (CST)
Showing nodes accounting for 421.70MB, 98.70% of 427.23MB total
Dropped 62 nodes (cum <= 2.14MB)
flat flat% sum% cum cum%
368.50MB 86.25% 86.25% 369MB 86.37% github.com/rclone/rclone/backend/s3.(*Object).uploadMultipart
53.20MB 12.45% 98.70% 53.20MB 12.45% github.com/rclone/rclone/lib/pool.New.func1
0 0% 98.70% 369MB 86.37% github.com/rclone/rclone/backend/s3.(*Fs).Put
0 0% 98.70% 369MB 86.37% github.com/rclone/rclone/backend/s3.(*Object).Update
0 0% 98.70% 53.20MB 12.45% github.com/rclone/rclone/fs/asyncreader.(*AsyncReader).getBuffer
0 0% 98.70% 53.20MB 12.45% github.com/rclone/rclone/fs/asyncreader.(*AsyncReader).init.func1
0 0% 98.70% 369.50MB 86.49% github.com/rclone/rclone/fs/operations.Copy
0 0% 98.70% 369.50MB 86.49% github.com/rclone/rclone/fs/sync.(*syncCopyMove).pairCopyOrMove
0 0% 98.70% 53.20MB 12.45% github.com/rclone/rclone/lib/pool.(*Pool).Get