Uploading with union

What is the problem you are having with rclone?

When using rclone copy to upload a file to a union (with policy "all"), it seems like rclone might be doing something like getting one block uploaded to all members of the union, and only then proceeding to try to upload the next block. Is this correct?

If so, it doesn't seem efficient (at least in some situations). For example, I'm currently uploading a pretty large file to a union with four members. If I upload to a member individually, I usually get a speed of about 4 m/s for the fastest, but only about 1 or 2 for the slowest. But as a union, I'm getting more like 0.5 k/s (which, given that there are 4 members, I assume really means I'm uploading at about 2 m/s total). So, assuming I'm right about the upload strategy, it seems like the faster backends are just wasting wall clock time waiting for the slower backends to catch up to them. On large files, that can really add up.

If I am right about this, is there some sort of option to make it more efficient in this kind of situation?

To be explicit, I'm using --transfers 1 at the moment, but I'm pretty sure I've seen this same behavior when using the default number of transfers too. And even if not, if it were uploading serially, I would expect by this point to see the file on two or three of the backends, whereas it is not yet on any of them.

What is your rclone version (output from rclone version)

1.53.3

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows 10 64 bit

Which cloud storage system are you using? (eg Google Drive)

Amazon S3 and Microsoft OneDrive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy Wasabi:RedactedData3 Temp-Limited:RedactedData3 --transfers 1 -P

The rclone config contents with secrets removed.

[Raw-Dropbox]
type = dropbox
token = ...

[Raw-GSuite]
type = drive
client_id = ...
client_secret = ...
scope = drive
token = ...
root_folder_id = ...

[Raw-OneDrive]
type = onedrive
token = ...
drive_id = ...
drive_type = personal

[Raw-S3]
type = s3
provider = AWS
env_auth = false
access_key_id = ...
secret_access_key = ...
region = us-east-1
acl = private
server_side_encryption = AES256
storage_class = DEEP_ARCHIVE
bucket_acl = private

[Raw-Wasabi]
type = s3
provider = Wasabi
env_auth = false
access_key_id = ...
secret_access_key = ...
endpoint = s3.wasabisys.com

[Dropbox]
type = alias
remote = Raw-Dropbox:

[GSuite]
type = alias
remote = Raw-GSuite:RedactedFolder1

[GSuite-Deprecated]
type = alias
remote = Raw-GSuite:RedactedFolder2

[OneDrive]
type = alias
remote = Raw-OneDrive:

[OneDrive-Deprecated]
type = alias
remote = Raw-OneDrive:RedactedFolder3

[S3Ireland]
type = alias
remote = Raw-S3:RedactedFolder4

[S3Ireland-Deprecated]
type = alias
remote = Raw-S3:RedactedFolder5

[S3Sydney]
type = alias
remote = Raw-S3:RedactedFolder6

[S3Sydney-Deprecated]
type = alias
remote = Raw-S3:RedactedFolder7

[S3Virginia]
type = alias
remote = Raw-S3:RedactedFolder8

[S3Virginia-Deprecated]
type = alias
remote = Raw-S3:RedactedFolder9

[Wasabi]
type = alias
remote = Raw-Wasabi:RedactedFolder10

[Wasabi-Deprecated]
type = alias
remote = Raw-Wasabi:RedactedFolder11

[Size-Unlimited]
type = alias
remote = GSuite:

[Size-Big]
type = union
upstreams = S3Ireland: S3Sydney: S3Virginia:
action_policy = all
create_policy = all
search_policy = all

[Size-Medium]
type = alias
remote = Wasabi:

[Size-Small]
type = alias
remote = OneDrive:

[Size-Tiny]
type = alias
remote = Dropbox:

[MinSize-Tiny]
type = union
upstreams = Size-Tiny: MinSize-Small:
action_policy = all
create_policy = all
search_policy = all

[MinSize-Small]
type = union
upstreams = Size-Small: MinSize-Medium:
action_policy = all
create_policy = all
search_policy = all

[MinSize-Medium]
type = union
upstreams = Size-Medium: MinSize-Big:
action_policy = all
create_policy = all
search_policy = all

[MinSize-Big]
type = union
upstreams = Size-Big: MinSize-Unlimited:
action_policy = all
create_policy = all
search_policy = all

[MinSize-Unlimited]
type = alias
remote = Size-Unlimited:

[Data-RedactedData1]
type = alias
remote = MinSize-Small:RedactedData1

[Data-RedactedData2]
type = alias
remote = MinSize-Small:RedactedData2

[Data-RedactedData3]
type = alias
remote = MinSize-Small:RedactedData3

[Data-RedactedData4]
type = alias
remote = MinSize-Small:RedactedData4

[Data-RedactedData5]
type = alias
remote = MinSize-Big:RedactedData5

[Data-RedactedData6]
type = alias
remote = MinSize-Small:RedactedData6

[Data-RedactedData7]
type = alias
remote = MinSize-Tiny:RedactedData7

[Data-RedactedData8]
type = alias
remote = MinSize-Small:RedactedData8

[Data-RedactedData9]
type = alias
remote = MinSize-Small:RedactedData9

[Data-RedactedData10]
type = alias
remote = MinSize-Medium:RedactedData10

[Data-RedactedData11]
type = alias
remote = MinSize-Small:RedactedData11

[Data-RedactedData12]
type = alias
remote = MinSize-Medium:RedactedData12

[Temp-Limited]
type = union
upstreams = OneDrive: S3Ireland: S3Sydney: S3Virginia:
action_policy = all
create_policy = all
search_policy = all

[Temp-S3All]
type = union
upstreams = S3Ireland: S3Sydney: S3Virginia:
action_policy = all
create_policy = all
search_policy = all

Does the debug log show what it's doing?

I am not sure - I didn't run it with -vv, and I'm reluctant to stop/start it because it takes many hours, and it's mostly done. I'll try it next time I run a command like this (assuming the question hasn't been answered yet).

Yes that is correct.

Glossing over some of the details, this means that a block is read on the source and then written in parallel to the upstreams.

In general it is undesirable to read the source more than once. If copying from the cloud this will multiply up your bandwidth use and if copying from local disk then this will create a lot more disk IO and disk seeking.

The operation can't complete until all the uploads have finished so unless the upload is being really inefficient then it should make any difference how exactly the files get uploaded to the total run time.

I my tests uploading to a union was the same speed as the speed of the slowest union member - would you agree?

Use create_policy = rand for ultimate performance.

Maybe I'm misunderstanding the documentation for "rand" - it says:

Calls all and then randomizes. Returns only one upstream.

I took that to mean that (in the context of create) it will upload the file to only one member of the union. Is that not so?

Yes. That’s correct. Or do you want upload to all the remote?

Yes, I want to upload the same file to all of the remotes in the union.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.