Memory usage S3-alike to Glacier without big directories

ncw · December 31, 2019, 9:05am

I made a few small tweaks to the code to tidy it up, here is hopefully the final version

https://beta.rclone.org/branch/v1.50.2-095-gaf767f2c-fix-s3-manager-beta/ (uploaded in 15-30 mins)

carles · December 31, 2019, 9:50am

I can't see the Linux builds done in the link (https://beta.rclone.org/branch/v1.50.2-095-gaf767f2c-fix-s3-manager-beta/)

It seems to be related to this problem:
https://github.com/rclone/rclone/runs/368847029

We are happy to test with some big files when a Linux build is available (I'm tempted to install go compiler and give it a try to fix it but no time right now )

I thought of using rclone md5sum to test that the written file is correct. Does this make sense to you or should we really download the file and do a md5sum/sha1sum locally? Some files might not fit locally but some yes.

Thanks ever so much!

ncw · December 31, 2019, 10:22am

A lint error - how annoying! I run those tests locally but I must have missed this iteration

Here is a fixed version building now...

https://beta.rclone.org/branch/v1.50.2-095-g5363afa7-fix-s3-manager-beta/ (uploaded in 15-30 mins)

Unfortunately multipart uploaded files don't have md5sums caclulated by aws (single part files do) so rclone adds a bit of metadata to the file with the expected md5sum in it. So using rclone md5sum isn't as good on multipart files on s3 as you would have liked.

carles · December 31, 2019, 12:01pm

More to report, good news and questions. I'm going to write the commands/steps that I did and what I saw and expected.

1.- We've used v1.50.2-095-g5363afa7-fix-s3-manager-beta to upload copy a file of 13 GB from bucketA to bucketB (two different providers) with the default chunk size and then we downloaded the copied file: all good! Memory usage: all good (0.5% or 1% of the memory machine, stable, no problem)

2.- I wanted to see that using another rclone the memory usage was going up. Did the same: memory usage stable (weird, I thought that would have gone up... does the memory management fix affect only multiple files being uploaded, not a single file?

3.- Then I added in the configuration file on the writing bucket: chunk_size = 5G which is how we had it originally (I removed it on step 1 in order to force more chunking): released rclone and v1.50.2-095-g5363afa7-fix-s3-manager-beta use loads of memory. Basically it went piling memory to 80% (ish) of the 16 GB RAM memory of the server based on "top". It used more than 5 GB I'm pretty sure (a copy for the read and another one for the write thread?)

Since the copy data was ok and that we are not in a super-rush for this one: we are going to use use v1.50.2-095-g5363afa7-fix-s3-manager-beta for copying the files and hope that if it uses big amounts of memory it will free it.

Did you expect that with chunk-size 5GB it would use more than 5 GB of memory? And with the new fix is to release it properly? The file was uploaded correctly, we haven't seen the free yet with the test that we've done.

Thanks very much!

carles · December 31, 2019, 12:14pm

Just to add in my latest comment: using the new v1.50.2-095-g5363afa7-fix-s3-manager-beta it hold 40% of the machine's memory. It was there stable for some time (but the minute-updates didn't show uploading any big file so I thought that the memory would be released by that time) and then later on it got terminated.
Maybe the chunk_size = 5G is the culprit somehow and we should lower this limit. Or it makes some bug more obvious...

ncw · December 31, 2019, 1:58pm

I'd say a chunk size of 5 GB is much too big for something which is buffered in memory.

The default is 5MB which rclone will scale up to get the number of parts less than 10000 as necessary

AWS recommendation is to upload lots of parts at once.

Increasing the chunk size will increase throughput a bit at the cost of memory. Increasing the concurrency will increase the throughput too, up to a limit

You should find that the defaults for both chunk size and concurrency are about right. S3 can cope with --transfers being higher though.

It should affect both. However go's memory management isn't straight forward as it has a garbage collector. It doesn't release any memory to the is unless it has been unused for 5mins if I remember correctly.

Probably - see above about garbage collector!

carles · December 31, 2019, 3:05pm

Ha! the chunk_size = 5G was that big because in some data copy earlier this year we thought that it was handy if the ETag matched the md5sum and we increased the chunk_size to maximize S3 ETag==md5sum (and help us when doing data verification). We also saw that CloudBerry software had bigger chunk_size than the default 5MB. I can't find the default on their website but I've found https://support.cloudberrylab.com/portal/kb/articles/backup-for-windows-how-to-increase-upload-speed

Either way, we will let the current rclone to keep copying and then use a smaller (probably the default) chunk size in the next one (which will contain many big files). We will let you know how it goes.

Thanks ever so much!

ncw · December 31, 2019, 3:07pm

If you want the largest possible files with etag matching md5sum then you want --s3-upload-cutoff 5G rather than setting the chunk size You can set this in the config file too with upload_cutoff = 5G if you want.

Interested to hear how it turns out

carles · December 31, 2019, 3:54pm

Thanks! Excellent! It's good that I shared why I had the chunk_size = 5G (funny enough I also had s3-upload-cutoff in the command line)

We are now using the latest (from this morning) rclone with the fix that you provided, no chunk_size, s3-upload-cutoff 5G in the configuration file. No max-size at the moment. Let's see how it evolves. Thanks!

carles · December 31, 2019, 6:44pm

A few hours later: still copying happily with the s3 fix (and without chunk_size=5G) (it never did more than 10 or 15 minutes with the big files that it has done)

What I don't know is: if I had removed the chunk_size = 5G without the new s3 manager: would it work? But it seems that the new s3 manager is not leaking !

Thanks again, happy new year's eve!

ncw · January 2, 2020, 10:20am

Great!

It probably would have done. However re-writing the s3manager code needed doing for lots of reasons!

Happy new year to you too!