Issues with "union"

STOP and READ USE THIS TEMPLATE NO EXCEPTIONS - By not using this, you waste your time, our time and really hate puppies. Please remove these two lines and that will confirm you have read them.

What is the problem you are having with rclone?

Irrespective of what I believe I am configuring, unions are failing [OneDrive] claiming insufficient space/quota, even when drives are shown as empty by rclone about command. Have tried adding :ro to the "full" drive to prevent its write usage and also tried fiddling with policy from no policy [default] to others.

Run the command 'rclone version' and share the full output of the command.

rclone v1.65.2

  • os/version: debian 11.8 (64 bit)
  • os/kernel: 5.10.0-27-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.21.6
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

OneDrive/O365 business

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync Source:Masters/dir  remote-union:dir GDFI --max-transfer 5T  -P -vv

Have tried the above without --max-transfer, added --transfer 1 also to see if change s made a difference. It didn't.

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

It has identifying information even when redacted, but this is not the first rodeo with rclone... Respectfully I must not provide the full file ...

Each remote is set up in the same way, e.g.

type = onedrive
token = [whole string redacted]
drive_id = [string redacted]
drive_type = business

Drive is verified working with rclone about remote-1: -vv and rclone lsd remote-1: -vv

Then the encryption level

type = crypt
remote = remote-1:enc
password = xxxxx

and then a chunker due to OneDrive size limitation [saw somewhere]

[remote-1-chu]
type = chunker
remote = remote-1-enc:chunk
chunk_size = 190Gi

The remote-1:enc dir was made by rclone mkdir remote-1:enc and the chunker (chu) wrote its own directory I guess.

A log from the command that you were trying to run with the -vv flag

´´´
2024/01/29 15:42:08 DEBUG : rclone: Version "v1.65.2" starting with parameters ["rclone" "sync" "DB1-ENC:Masters/GDFI/" "od-liam1-union:GDFI" "--max-transfer" "5T" "-P" "-vv"]

[then all the remotes set up without error. As over 20 and they have identifying info I've left it out]. I have independently verified each remote is valid per earlier info]

Examples

2024/01/29 15:42:09 DEBUG : Creating backend with remote "L1-22-chu:"
2024/01/29 15:42:09 DEBUG : Creating backend with remote "L1-22-enc:chunk"

So L1-22 enc and chunk are set


This features a lot:

2024/01/29 15:42:11 DEBUG : Reset feature "ListR"

2024/01/29 15:42:15 DEBUG : union root 'GDXX': actionPolicy = *policy.EpAll, createPolicy = *policy.EpMfs, searchPolicy = *policy.FF

Then it starts to try and SYNC the data...

Then the errors come. Sent to Pastebin. I hope all private stuff is redacted... L/TITLEINFORMATION/TITLEINFORMATION/0-9/12 Years A Slave (2013) [US]/Archived.l - Pastebin.com

I've read and Googled to no avail. I noted earlier comments about parallel operations being difficult to tame with writing to multiple union drives, but then making -transfers 1 should stop that issue AND similarly even restricting the max transfer size to one of the "drives" [as a test] made no difference.

Similar happens on another OneDrive setup I have, but I have not tried nor can easily the exact same transfer. The commands are basic, the content is different, but the scenario is the same [quota on a drive].

Have I missed something obvious? Happy to send ncw or anyone he directs the full rclone config redacted if it really makes a difference, but I believe I've given the salient info.

Tks

you talk about onedrive all the time but then in your log I see:

2024/01/29 15:47:02 INFO : Dropbox root 'rc/enc/qva0g9r992u8v3fehqd9rvei98/1cpetpl5ol39kknsvovcm94r9k': Committing uploads - please wait...

So you are mixing many things here I think. Let's start with basics:

What is this union configuration?

and what is result of:

rclone about od-liam1-union:

BTW - can you format your posts? The way you do this now is very difficult to read. Do not enclose all text in ``` but only code parts.

Use ~~~ instead of ``` - they work the same. Maybe easier to type.

Above is just one line from your configuration. Can you post result of:

rclone config redacted od-liam1-union:

As indicated the union has been tried without any policy set (e.g. Rclone default] AND the version shown here. I suspect other versions too.

[od-liam1-union]
type = union
upstreams = "L1-3-chu:ro" "L1-1-chu:" "L1-2-chu:ro" "L1-23-chu:" "L1-4-chu:ro" "L1-5-chu:ro" "L1-6-chu:ro" "L1-7-chu:ro" "L1-8-chu:ro" "L1-9-chu:ro" "L1-10-chu:ro" "L1-11-chu:ro" "leeg-chu:" "L1-12-chu:ro" "L1-13-chu:ro" "L1-14-chu:ro" "L1-15-chu:ro" "L1-16-chu:ro" "meganb-chu:ro" "L1-17-chu:ro" "L1-18-chu:ro" "L1-19-chu:ro" "L1-20-chu:ro" "L1-21-chu:ro" "L1-22-chu:ro" "L1-23-chu:ro"
actionPolicy = *policy.EpAll
createPolicy = *policy.EpMfs
searchPolicy = *policy.FF
### Double check the config for sensitive info before posting publicly

Objective [other than getting it to save something!] is maximise space use/one copy only.

But if it refuses to even do that with all but one drive marked read-only I am confused.

read only remote in union configuration syntax is:

L1-3-chu::ro

What you do is trying to use directory named ro on this remote and it does not exist.

I would suggest for testing start with maybe two onedrive remotes and try to copy one file - capture all debug log and post here.

I suspect that error was "just" made when trying to self-diagnose this evening by making every drive read only but the first one. The error existed when all drives where not read-only. I also tried making only the 2x "full drives" read only before this with no difference.

Edit ah double colon usage rather than one. I will try correcting them now and re-running. It would not explain the original problem when zero drives were marked read only, but one step at a time!

Revert shortly.

I use ondrive(s) union all the time without any issues so I am 99% sure you just made some mistake.

We need data to try to identify what it is.

I suggest to start with simple union - just two upstreams. And simple operation - copy one file. So we have small and easy to read log. This should tell us probably what is wrong.

Undoubtedly. I accept my limitations and do take advice gracefully. Sometimes an extra pair of eyes sadly is helpful, even if one tries to be independent as much as possible. Hope to revert in c 10 min

another observation -

you could simplify all setup by using one chunker and one crypt

[onedrive1]
type = onedrive

[onedrive2]
type = onedrive

[onedrive-union]
type = union
upstreams = onedrive1:mydata1 onedrive2:mydata2

[onedrive-union-crypt]
type = crypt
remote = onedrive2-union:

[onedrive-union-crypt-chunker]
type = chunker
remote = onedrive2-union-crypt:

But your way should work too - even if requires 3 times more remotes:). I doubt it is an issue. Just much more work to configure.

Also for your purpose union - to share space use the following policies:

action_policy = epall
create_policy = mfs
search_policy = ff

default create policy epmfs does not delete directory which exist on multiple remotes (it does only on one). This is what I figured out using setup very similar to yours: onedrive x X, union, crypt, chunker

Ah, I had not understood that from the documentation and prior use [Dropbox union and the like] but maybe I missed a clear signal if it was in the documentation. I got the impression it was a one to one mapping of everything.

As the "encryption" elemented drive is remote:enc [dir = enc] I had not thought I could pass one crypt to the "union of the remote1:enc remote2:enc] and so on. I have used the same secret/password anyway as I did not want to track a lot of different ones. So it can be a case of simplyfing later.

The original with the correction made is running at the moment and has got a little further - so I guess I will let it run [it is late here] and I can update this thread tomorrow. If it stops working I will try a simple union of two remotes. I have that ready in rclone.conf.

So until tomorrow, and thank you for your kind assistance thus far.

1 Like

As promised an update. The data continues to write so far - it has not hit a 5tb limit yet - but even before it was failing way before that. So maybe I had added a small error I couldn't see and then further attempts to fix it compounded matters. I don't know. So I will let it keep going and then try your "cleaner" implementation afterwards [after proving everything is stable]. Your assistance is valued and appreciated.

As you mentioned that your source is dropbox you should also:

  1. Make sure that your dropbox remote is using your own client_id
  2. To prevent dropbox throttling always add --tpslimit 12 --tpslimit-burst 0 to your rclone sync commands

And for onedrive crypt you should use filename_encoding = base32768 if you are not doing it now. Otherwise onedrive has very limited max path length.

Oh. Do I need to delete everything copied over so far, or will just adding that to config "convert" existing encrypted data as well as setting the base going forward?

You do not have to use it but I would strongly advice to consider it.

Changing it will not convert anything unfortunately.

In theory it is possible to convert encoding server-side - not sure how it will play with chunker though. You have to try:

What you do is set up a new crypt remote by copying the old one in the config file, renaming it and adding filename_encoding = base32768. Choosing a different destination directory is a good idea here too so the old and the new don't overlap.

You can then do a server side move something like

rclone move --server-side-across-configs old: new:

With your setup (different crypt for every union member) if it is not a lot of data probably the easiest is to start from zero.

And lesson is before you start massive data migration it is worth to test and ask questions before - not half way through:)

Also this one is weird... 190GiB chunk size? It makes all your union extremely wasteful. When you union multiple smaller remotes I would imagine you want to use chunker to spread your data across. With such massive chunk size you have to maintain huge amount of free space for this to work. I would go for something like 1GiB or even less.

Here you are for reference my setup for something very similar:

[onedrive1]
type = onedrive
client_id = xxx
client_secret = xxx
token = xxx
drive_id = xxx
drive_type = personal

[onedrive2]
type = onedrive
client_id = xxx
client_secret = xxx
token = xxx
drive_id = xxx
drive_type = personal

[onedriveN]
type = onedrive

[onedrive-union]
type = union
upstreams = onedrive1:a onedrive2:b ... onedriveN:x
action_policy = epall
create_policy = mfs
search_policy = ff
cache_time = 120

[onedrive-union-crypt]
type = crypt
remote = onedrive-union:
password = xxx
password2 = xxx
filename_encoding = base32768

[onedrive-union-crypt-chunker]
type = chunker
remote = onedrive-union-crypt:chunk
chunk_size = 500Mi
hash_type = sha1all
name_format = *.rcc###

I don't know the history but I'm pretty sure I got that "recommendation" from reading probably this site. Then other information suggested I need not use it at all. I'm doing a test write on the new construction now but not writing through the chunker. Can make a further test with your settings as well before it gets too far.

Perhaps someone thought they were being helpful with the information I found and posted it in good faith. I guess that's always the risk that good faith information = bad practice and it becomes lore in certain quarters when doing research.

I just double checked rclone's own docs [in case I found it there - I didn't] and it claims something else too.

"Above this size files will be chunked - must be multiple of 320k (327,680 bytes) and should not exceed 250M (262,144,000 bytes) else you may encounter "Microsoft.SharePoint.Client.InvalidClientQueryException: The request message is too big." Note that the chunks will be buffered into memory."

All fun isn't it :slight_smile:

Recommendations have their reasons - and this one I doubt had anything to do with onedrives' union.

This has NOTHING to do with chunker remote but word "chunk" used in description of some onedrive features. You are mixing things.

If you explain in few words what is your objective then we can come with some sensible recommendations.

It is very possible small stuff can get missed, for reasons given earlier. I can't find the reference to the original "recommendation for chunker size" but it was, from memory, allegedly to get around a Microsoft restriction. It was a few weeks ago. So far the revised configuration with your help has been running all day without an issue, so unless OneDrive does complain about chunking or something, it can be good to go. Plus I have your chunking suggestion later to try [loathed to stop the transfer - it will take a time already and the subscription to Dropbox is in its final stages].

Thank you again for all thus far.

1 Like

It is your data - your decision:) But your setup as it is now is very bad and inefficient.

Your default crypt encoding introduces very serious limitations in terms of path length (probably about 170 characters max - all path).

Your chunker setup (it is in wrong place and uses wrong chunk size) does not bring any benefits. On contrary you might have situation later when you will have still many TB free in your union but won't be able to copy any bigger file - because you do not chunk arccos all remotes but for every union remote separately - which is pointless.

And all together you have to create and manage 3x more remotes configurations than needed - it only increases chances of error.

Yes it will work as it is. But it is very bad engineering:) You can drive a car on three wheels too. It will be moving.

Seems on a very quick read back that maybe one of my messages didn't get through. But my eyes are tired now after the day. I [thought I] had indicated I'd implemented your suggestions concerning the single union and single crypt. I then started it re-running when it was less clear about Chunker, as after rebuilding/erasing the existing crypt I need to get some stuff backing up. There's a lot to do... So it leaves the chunker "not operational" [or used] and I've your notes from that. So my intention is to put that into operation and change the synch's destination [I guess] to the chunker rather than crypt [one reason for not doing it today is everything is tired and I didn't want to make even more errors]. Maybe I get up and stop the sync and do the final bit [chunker] in the morning. Rest assured, I prefer four wheels driving efficiently than three wheels wobbling all over the place.

So to check, to use your examples:

[onedrive-union-crypt]
type = crypt
remote = onedrive-union:
password = xxx
password2 = xxx
filename_encoding = base32768

[onedrive-union-crypt-chunker]
type = chunker
remote = onedrive-union-crypt:chunk
chunk_size = 500Mi
hash_type = sha1all
name_format = *.rcc###

If you had synched to a directory onedrive-union-crypt:ABC, is it correct to then sync to the chunker direct onedrive-union-crypt-chunker:ABC and it "rewrites" if necessary to the correct ABC directory in the crypt? Must confess being very tired it is now confusing. Of course I can test, but then if the error comes later rewinding is different.

Looking at this thread, for example, seems to give two solutions all different: Encrypted chunker - #10 by thestigma

So I'll happily take your guidance on your given example and then where to "sync" to so at the end of the day the data is stored in the equivalent of remote-crypt:ABC [directory off the "root"] and chunking if necessary.

Will look at this tomorrow now. Back to bed :slight_smile:

1 Like