Encrypted chunker

lawmanuk · November 2, 2019, 12:29am

Hi,

I want to be able to do this:

create remote1
created remote2(ie. chunker for remote1)
created remote3(ie. encrypt name/content/directory for remote2)

I expected to find:

chunks with filenames encrypted
each filename would look different even if chunks (due to salt).
each chunk extension name would be encrypted too (ie. ...rclone_chunk.001 would be encrypted).

What I found

yes
no. the filenames looked exactly the same. I thought this shouldn't happen if salted properly?
no. I could see ...rclone_chunk.001 etc. even though I the encryption happened after the chunking.

Please advise how to achieve what I was seeking.

Thanks.

thestigma · November 2, 2019, 2:26am

You are no doubt linking your remotes like this:
OS --> crypt --> chunker --> cloudremote

that means the encrypting is happening first, then files get chunked (this is why they share names and have extensions).

You want
OS --> chunker --> crypt --> cloudremote

Go check your config file and look at what is says in:
remote =

the chunker should point to the name of your crypt
the crypt should point to the name of your cloudremote

questions?
If confused, post your config and I will fix it for you (but make sure to redact any senstive nifo, like clientID, clientsecret, token and crypt-keys)

lawmanuk · November 2, 2019, 10:09am

[pcld-ab]
type = pcloud
token = XXXXXX

[chnk-pcld-ab]
type = chunker
remote = pcld-ab:rchnk
chunk_size = 100k
hash_type = md5

[crpt-chnk-pcld-ab]
type = crypt
remote = chnk-pcld-ab:rchkcr
filename_encryption = standard
directory_name_encryption = true
password = XXX
password2 = XXX

I did the cloud remote first.
then chunked it
then crypted the chunked remote

Seems to be in same order you've suggested?

thanks

darthShadow · November 2, 2019, 11:17am

It should be cloudremote, then crypt, then chunker.

lawmanuk · November 2, 2019, 11:19am

that would chunk the crypted files, with ...rclone_chunk.001 filenames not being encrypted.

I wanted to encrypt the full filenames, even after chunking, so chunked first, then crypted as thestigma also seems to suggest?

But my plan didn't work properly, as the ..rclone_chunk.001 filename part still isn't encrypted. Only the initial encrypted filename.

[newuser@manjaro 20llts55ck6ugvfsi69steddkk]$ ls -all
total 516
drwxr-xr-x 5 newuser newuser   4096 Oct 31 22:29 .
drwxr-xr-x 3 newuser newuser   4096 Oct 31 22:29 ..
-rw-r--r-- 1 newuser newuser     76 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.001
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.002
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.003..tmp_1572560873
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.004..tmp_1572560873
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.005
-rw-r--r-- 1 newuser newuser  12448 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.006
drwxr-xr-x 2 newuser newuser   4096 Oct 31 22:29 l5ef8n4b3r6e4h5oi47qm56c30
drwxr-xr-x 2 newuser newuser   4096 Oct 31 22:29 olt373rtb4bmgj8sri7nd494f0
drwxr-xr-x 3 newuser newuser   4096 Oct 31 22:29 v5s938sjhbj040ga3r1i31l808

Example is above.

Don't want the ...rclone_chunk.001 filename to be unencrypted.
Would prefer each subsequent chunk to have a different encrypted name.
Not sure why there is *.tmp files too

darthShadow · November 2, 2019, 4:37pm

What you will see and what will be actually stored in the remote are total opposites. With the suggested method of CloudRemote -> Crypt -> Chunker -> OS (User), what actually will be stored in the remote are encrypted chunks.

lawmanuk · November 2, 2019, 4:39pm

what I've shown above is the actual remote, as its mounted.

I'm encrypting the chunks, so don't expect to see above filenaming.

thestigma · November 2, 2019, 4:41pm

Sorry, I realize now I may have made a wrong assumption.

What you said you checked how the files were looked - what remote did you use to look at them?

lawmanuk · November 2, 2019, 4:42pm

I tested this with pcloud backend. This has now been mounted and the file naming is as listed above.

thestigma · November 2, 2019, 4:56pm

Ok, then I did not misunderstand I think.

The configuration you have shown does not match what you are describing.
I suspect you are thinking about it in reverse.

In the end you will be mounting the crypt (with your current config).
your crypt points to the chunker
and your chunker points to the pcloud remote.

That will result in
OS --> crypt --> chunker --> pcloud
that will make encrypted chunks, but since hte encryption is on the "inside" and the chunking is on the "outside" then you will see the chunking format in the raw files.
This is not "wrong", but it does not obscure the data as well - since you can see what chunks are grouped together. Ie. it provides no real obscuration of the size if that is your goal.

If you reverse the chunker and the crypt so..
you mount the chunker, and point it to your crypt
and point the crypt to pcloud
then you will get
OS --> chunker --> crypt --pcloud
then the files will be split first, then encrypted - resulting in raw files that are very obscured and can not be recognized as groups (unique filesnames due to salt - and no extensions revealing that a chunker was used).

Here is what that sort of setup would look like:

[pcloud]
type = pcloud
token = XXXXXX

[pcloudCrypt]
type = crypt
remote = pcloud
filename_encryption = standard
directory_name_encryption = true
password = XXX
password2 = XXX

[pcloudCryptChunk]
type = chunker
remote = pcloudCrypt
chunk_size = 100k
hash_type = md5

To use all three chained you would use pcloudCryptChunk to mount or interact with.
Note the physical order of the remote in the config foes not matter. I just rearranged them to make traffic easier to visualize. Traffic from OS to cloud moves from bottom to top. Then the traffic from could to OS move the other way, in reverse - top to bottom.

I hope I am not misteaching you. I have not extensively used the chunker myself yet, and I don't like it when darth disagrees with me, because I know he typically knows his stuff. Let me know if there are any special rules regarding chunker ordering that I am not aware of darth

darthShadow · November 2, 2019, 5:01pm

Excellent explanation. I don't think there should be any problems with the proposed solution.

lawmanuk · November 2, 2019, 5:02pm

works perfectly now.

In my mind it still makes more sense to crypt the chunked-remote, but it works perfectly to chunk the crypted-remote.

Just have to accept that my mind is wrong.

Thanks all

lawmanuk · November 2, 2019, 5:07pm

btw...

is there anyway to have:

Chunks of 1M for everything
This includes all smaller files getting joined together into that 1M
Larger files would naturally get chunked down to 1M, with remaining excess of file perhaps less than 1M.

That way every single file would be 1M, except tail end of large files.

Its (2) that I don't think rclone can do at present, but I may be wrong?

thestigma · November 2, 2019, 5:49pm

Not yet unfortunately.

For that we will need some form of "merger" backend that combines smaller files transparently. This is something I have some plan of making a proposal for at some point because it would have massive benefits for performance on clouds where file-access-pr-second is limited (which is a lot of them).

I would also advise you generally that having too low chunks may be detrimental to performance. 1M chunks would be REALLY bad on something like a Gdrive that can initiate 2-3 new transfers pr second. Slightly less bad on a cloudservice that can do many more transfers - but still... not exactly ideal.

Since each small chunk will work as it's own transfers its going to need 1 API request each, and there will be a latency penalty in each new chunk that is requested. So downloading a 10GB file now become a massive operation involving 10.000 api calls (which you may pay for depending on provider) + 10.000 small pauses interspersed amongst your various TCP transfers. Lots of connections will mask that to some degree, but with such small chunks it would still have an impact.

Do note that crypted files are not the exact same size as the original file. They are slightly longer - and I think the padding is not a static size (I could be wrong), so this should make it at least non-trivial to accurately determine the size of the file (though it will be aproximately same'ish).

Maybe a "padder" bacakend would be the ideal solution for this spesific concern, where files could be appended with some junk data before encryption to appear as a certain minimum size (the same as size as your chunker chunks ideally). It would waste some storage space, but should not be hard to implement I think. Maybe you can go make an feature suggestion issue on something like that if you want:

Gerry33 · November 24, 2019, 5:18pm

Hi all,
I followed this discussion with great interest as I think this is a very common setup.
My setup is alike outlined by @thestigma.

However I made one add-on, which causes weird behaviour :

I like to create another subtarget at the remote [pcloud]
e.g:
rclone sync pcloud:SomeSubDir …

That gives me the chance to create remote crypted subdirs.
However when doing so, lots of errors are produced:

2 019-11-24 18:07:55 ERROR : nnnn.jpg.rclone_chunk.001..tmp_1574615275: Couldn't move: Copy call failed: 409
2019-11-24 18:07:55 ERROR : nnnn.jpg.rclone_chunk.001..tmp_1574615275: Failed to remove temporary chunk: 404
2019-11-24 18:07:55 ERROR : nnnn.jpg: Failed to copy: Copy call failed: 409

So what's wrong with this setup ?

thestigma · November 24, 2019, 5:39pm

I'm not sure what you are trying to do here.
A sync command needs both a source and a destination

Could you show me your full config? (please redact any senstive info clike clientID, clientsecret, token and any crypt keys).

Will check back after dinner

Gerry33 · November 24, 2019, 6:01pm

Will check back after dinner
Dinner is more important than this technical stuff ! Enjoy it !

Anyway, my command above was incomplete, sorry:
Here is what I want to achieve:

rclone sync SomeSrcDirA pcloud:SomeSrcDirA …
rclone sync SomeSrcDirB pcloud:SomeSrcDirB …

SomeSrcDirA and ..B have nothing in common.

That allows me to use the same target for all operations into one remote target, but still distinguish them on the remote.

Thanks for taking care.
Gerry

thestigma · November 24, 2019, 6:52pm

Yes that is very reasonable and there is no reason that should not work.
So does this problem only occur when you use a subdirectory and not otherwise? If so that is strange and I suspect some sort of simple syntax problem.

Is pcloud bucket-based?
non-bucket based remotes (like for example Gdrive) use this format:
Gdrive:\directory\filename.txt
while bucket-based remotes (like Google Cloud) use this format:
Gcloud:bucket\directory\filename.txt

So I think that you may be either missing a ":" in the path (if it is not bucket based) or confusing a bucket with a folder (if it is bucket based). Unlike folders, buckets typically have the pre-made to me used - rather than generated and deleted on the fly by rclone.

Does that make any sense to you? I think pcloud may be bucket-based from what I read, but I have never used it myself so I am not certain.

ncw · November 25, 2019, 11:39am

I think that should work...

409 errors are conflict errors. I'm not sure what would cause those. Could it be that the intermediate file name nnnn.jpg.rclone_chunk.001..tmp_1574615275 is too long for pcloud?

Gerry33 · November 25, 2019, 12:54pm

Thanks Nick for your response.

Could it be that the intermediate file name nnnn.jpg.rclone_chunk.001..tmp_1574615275 is too long for pcloud?

I'm not using 'pcloud' as remote target, rather I use WEBDAV. (testing instance on WIN10 using a TOMCAT-server)

The paths are really very long when enrypted but the underlying NT- file system should be able to handle that.

I will check the OS and server limits again.

Interesting facts:

All uploads succeeded
When issueing the same command a second time, no problems.

I will test again.

Thanks again for your kind assistance !
Gerry