Encrypted chunker

Hi,

I want to be able to do this:

  1. create remote1
  2. created remote2(ie. chunker for remote1)
  3. created remote3(ie. encrypt name/content/directory for remote2)

I expected to find:

  1. chunks with filenames encrypted
  2. each filename would look different even if chunks (due to salt).
  3. each chunk extension name would be encrypted too (ie. ...rclone_chunk.001 would be encrypted).

What I found

  1. yes
  2. no. the filenames looked exactly the same. I thought this shouldn't happen if salted properly?
  3. no. I could see ...rclone_chunk.001 etc. even though I the encryption happened after the chunking.

Please advise how to achieve what I was seeking.

Thanks.

You are no doubt linking your remotes like this:
OS --> crypt --> chunker --> cloudremote

that means the encrypting is happening first, then files get chunked (this is why they share names and have extensions).

You want
OS --> chunker --> crypt --> cloudremote

Go check your config file and look at what is says in:
remote =

the chunker should point to the name of your crypt
the crypt should point to the name of your cloudremote

questions?
If confused, post your config and I will fix it for you (but make sure to redact any senstive nifo, like clientID, clientsecret, token and crypt-keys)

[pcld-ab]
type = pcloud
token = XXXXXX

[chnk-pcld-ab]
type = chunker
remote = pcld-ab:rchnk
chunk_size = 100k
hash_type = md5

[crpt-chnk-pcld-ab]
type = crypt
remote = chnk-pcld-ab:rchkcr
filename_encryption = standard
directory_name_encryption = true
password = XXX
password2 = XXX

I did the cloud remote first.
then chunked it
then crypted the chunked remote

Seems to be in same order you've suggested?

thanks

It should be cloudremote, then crypt, then chunker.

that would chunk the crypted files, with ...rclone_chunk.001 filenames not being encrypted.

I wanted to encrypt the full filenames, even after chunking, so chunked first, then crypted as thestigma also seems to suggest?

But my plan didn't work properly, as the ..rclone_chunk.001 filename part still isn't encrypted. Only the initial encrypted filename.

[newuser@manjaro 20llts55ck6ugvfsi69steddkk]$ ls -all
total 516
drwxr-xr-x 5 newuser newuser   4096 Oct 31 22:29 .
drwxr-xr-x 3 newuser newuser   4096 Oct 31 22:29 ..
-rw-r--r-- 1 newuser newuser     76 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.001
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.002
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.003..tmp_1572560873
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.004..tmp_1572560873
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.005
-rw-r--r-- 1 newuser newuser  12448 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.006
drwxr-xr-x 2 newuser newuser   4096 Oct 31 22:29 l5ef8n4b3r6e4h5oi47qm56c30
drwxr-xr-x 2 newuser newuser   4096 Oct 31 22:29 olt373rtb4bmgj8sri7nd494f0
drwxr-xr-x 3 newuser newuser   4096 Oct 31 22:29 v5s938sjhbj040ga3r1i31l808

Example is above.

  1. Don't want the ...rclone_chunk.001 filename to be unencrypted.
  2. Would prefer each subsequent chunk to have a different encrypted name.
  3. Not sure why there is *.tmp files too

What you will see and what will be actually stored in the remote are total opposites. With the suggested method of CloudRemote -> Crypt -> Chunker -> OS (User), what actually will be stored in the remote are encrypted chunks.

what I've shown above is the actual remote, as its mounted.

I'm encrypting the chunks, so don't expect to see above filenaming.

Sorry, I realize now I may have made a wrong assumption.

What you said you checked how the files were looked - what remote did you use to look at them?

I tested this with pcloud backend. This has now been mounted and the file naming is as listed above.

Ok, then I did not misunderstand I think.

The configuration you have shown does not match what you are describing.
I suspect you are thinking about it in reverse.

In the end you will be mounting the crypt (with your current config).
your crypt points to the chunker
and your chunker points to the pcloud remote.

That will result in
OS --> crypt --> chunker --> pcloud
that will make encrypted chunks, but since hte encryption is on the "inside" and the chunking is on the "outside" then you will see the chunking format in the raw files.
This is not "wrong", but it does not obscure the data as well - since you can see what chunks are grouped together. Ie. it provides no real obscuration of the size if that is your goal.

If you reverse the chunker and the crypt so..
you mount the chunker, and point it to your crypt
and point the crypt to pcloud
then you will get
OS --> chunker --> crypt --pcloud
then the files will be split first, then encrypted - resulting in raw files that are very obscured and can not be recognized as groups (unique filesnames due to salt - and no extensions revealing that a chunker was used).

Here is what that sort of setup would look like:

[pcloud]
type = pcloud
token = XXXXXX

[pcloudCrypt]
type = crypt
remote = pcloud
filename_encryption = standard
directory_name_encryption = true
password = XXX
password2 = XXX

[pcloudCryptChunk]
type = chunker
remote = pcloudCrypt
chunk_size = 100k
hash_type = md5

To use all three chained you would use pcloudCryptChunk to mount or interact with.
Note the physical order of the remote in the config foes not matter. I just rearranged them to make traffic easier to visualize. Traffic from OS to cloud moves from bottom to top. Then the traffic from could to OS move the other way, in reverse - top to bottom.

I hope I am not misteaching you. I have not extensively used the chunker myself yet, and I don't like it when darth disagrees with me, because I know he typically knows his stuff. Let me know if there are any special rules regarding chunker ordering that I am not aware of darth :slight_smile:

Excellent explanation. I don't think there should be any problems with the proposed solution.

works perfectly now.

In my mind it still makes more sense to crypt the chunked-remote, but it works perfectly to chunk the crypted-remote.

Just have to accept that my mind is wrong.

Thanks all

btw...

is there anyway to have:

  1. Chunks of 1M for everything
  2. This includes all smaller files getting joined together into that 1M
  3. Larger files would naturally get chunked down to 1M, with remaining excess of file perhaps less than 1M.

That way every single file would be 1M, except tail end of large files.

Its (2) that I don't think rclone can do at present, but I may be wrong?

Not yet unfortunately.

For that we will need some form of "merger" backend that combines smaller files transparently. This is something I have some plan of making a proposal for at some point because it would have massive benefits for performance on clouds where file-access-pr-second is limited (which is a lot of them).

I would also advise you generally that having too low chunks may be detrimental to performance. 1M chunks would be REALLY bad on something like a Gdrive that can initiate 2-3 new transfers pr second. Slightly less bad on a cloudservice that can do many more transfers - but still... not exactly ideal.

Since each small chunk will work as it's own transfers its going to need 1 API request each, and there will be a latency penalty in each new chunk that is requested. So downloading a 10GB file now become a massive operation involving 10.000 api calls (which you may pay for depending on provider) + 10.000 small pauses interspersed amongst your various TCP transfers. Lots of connections will mask that to some degree, but with such small chunks it would still have an impact.

Do note that crypted files are not the exact same size as the original file. They are slightly longer - and I think the padding is not a static size (I could be wrong), so this should make it at least non-trivial to accurately determine the size of the file (though it will be aproximately same'ish).

Maybe a "padder" bacakend would be the ideal solution for this spesific concern, where files could be appended with some junk data before encryption to appear as a certain minimum size (the same as size as your chunker chunks ideally). It would waste some storage space, but should not be hard to implement I think. Maybe you can go make an feature suggestion issue on something like that if you want: