Performance w/small files B2 copy

What is the problem you are having with rclone?

Not a problem but a performace issue

What is your rclone version (output from rclone version)

rclone v1.49.3

  • os/arch: windows/amd64
  • go version: go1.12.3

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows 2016

Which cloud storage system are you using? (eg Google Drive)

BackBlaze

The command you were trying to run (eg rclone copy /tmp remote:tmp)

& 'rclone.exe' @('copy', "$Ruta\$FileName", "BackBlaze:$BucketName")

==========================================
Hi all
Thanks to all the team who maintain this forum and this tool.
I've got the job to upload our enterprise backup to BackBlaze B2
I've been testing three different tools: rclone, b2 and cyberduck cli and I decided by rclone.
My data universe is about 4TB with file sizes starting from 300 Mb to 600Gb approx.
My script sends one file at a time to the cloud...
With biggest files, performance is great, about 80Mbits/sec :grinning:
But, with with smaller ones (300 Mb to 1Gb approx) performance is not good, about 30Mbits/s :pensive:
What I have seen is that rclone opens more in parallel connections for large files than for smaller ones, I'm right?
If that is the case, and knowing that the number of parallel connections is decisive for a good performance, How can I instruct rclone to open more connections even when the files are not very large?
Right now my powershell sentence is very simple and I'm using default config:

& 'rclone.exe' @('copy', "$Ruta\$FileName", "BackBlaze:$BucketName")

Regards

If you increase --transfers then rclone will increase the number of chunks transferred at once per file.

You can also increase

  --b2-chunk-size SizeSuffix   Upload chunk size. Must fit in memory. (default 96M)

Note that chunks are held in memory so you'll need approx --transfers * --b2-chunk-size of memory.

Thank you, I will try it and i'll be back.

Hi @ncw I were doing some test with --transfers 50 in my upload line and then looking at netstat command how many connections originated from rclone process.
In the best case there were 30 opened connections at a time for a 2.5GB file and in a case of a 1GB file there were around 10 connections at a time approx.
Sadly I could not verify that --transfer parameter has had an effect in the number of connections opened in parallel.
From my point of view, Rclone "decides" how many connections to be opened based on the file size more than any other parameter.
Then I went back to "Edit advanced config" for the given endpoint just in case I were missing something but the only interesting configuration there was "chunk_size".
So if there were another parameter to include, it would be great, else I'm thinking in sending two or three files in parallel to maximize bandwidth usage.
Regards

Just loud thinking, could it be that --transfer parameter takes effect on sync but not in copy?

Well you can't have more parallel transfers than file_size / chunk_size if that is what you mean?

Ok, I think now I can understand the behavior, so my guess was correct.
Maybe I was waiting for a similar behavior to B2 upload tool, where you can use --threads parameter and it will open as many connections as threads regardless of chunk size.
Evidently, if I want more connections then i have to narrow chunk_size.
Thank you.

1 Like

Hi all
I'm updating this post to inform that reducing the chunk_size greatly increased the bandwidth utilization.
Now I'm using --transfers 100 and chunk_size 15 Mb, with this setup I could get 100 connections in parallel and a full utilization of the bandwidth.
Thanks again to help me understand the behavior of these settings.
Regards.

No worries! Larger chunks helps with larger files I think, however the setting of --b2-chunk-size is largely historical. B2 used to only support 100Mb chunks (10E6 bytes) which is where the default of 96MiB comes from. I think chunks can be as small as 5MB.

Do you think the default should change?

It is clear that increasing parallel connections increases throughput. No doubt.
What I think is most important is to maximize the number of connections. With that premise reducing the size of chunk worked great in my case (although the B2 documentation speaks of 100Mb, a size of 15Mb was well supported).
My suggestion is to make this clear in the documentation.
Regards.

I'd love to have a PR with doc updates, or maybe just posting suggestions here:

The correct file to patch would be: https://github.com/rclone/rclone/blob/master/docs/content/b2.md

Though maybe the help should be on the chunk_size parameter which is here

Great!
Many thanks.
Regards.

Hi all
After a while working with this configuration very well, yesterday I faced a problem.
Some big files were refused to upload by the B2 backend.

rclone.exe : 2019/10/06 19:26:32 ERROR : files01.vm-52863D2019-10-05T210044_70C4.vbk: Failed to copy: "files01.vm-52863D2019-10-05T210044_70C4.vbk" too big (298349092864 bytes) makes too many parts 18969 > 10000 - increase --b2-chunk-size

It seems that the number of parts of a file can be chunked can't exceed 10000
I comment it just in case the documentation did not reflect this limit,(I didn't checked) If it is already in the doc, please dismiss.

Now I have a doubt; b2-chunk-size parameter can be configured at the same moment of firing rclone or only can be previously configured by rclone config ?

You can increase the --b2-chunk-size on the command line which should hopefully solve the problem.

Beware that B2 chunks are buffered in memory.