Encrypted chunker

that would chunk the crypted files, with ...rclone_chunk.001 filenames not being encrypted.

I wanted to encrypt the full filenames, even after chunking, so chunked first, then crypted as thestigma also seems to suggest?

But my plan didn't work properly, as the ..rclone_chunk.001 filename part still isn't encrypted. Only the initial encrypted filename.

[newuser@manjaro 20llts55ck6ugvfsi69steddkk]$ ls -all
total 516
drwxr-xr-x 5 newuser newuser   4096 Oct 31 22:29 .
drwxr-xr-x 3 newuser newuser   4096 Oct 31 22:29 ..
-rw-r--r-- 1 newuser newuser     76 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.001
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.002
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.003..tmp_1572560873
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.004..tmp_1572560873
-rw-r--r-- 1 newuser newuser 102400 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.005
-rw-r--r-- 1 newuser newuser  12448 Jul 13  2018 ek7hllu7go400aneadiatucdrofg7oc97v90qmv75asrmdcm74n6ejh6cs7rlf7o89hm91i6fkmtk.rclone_chunk.006
drwxr-xr-x 2 newuser newuser   4096 Oct 31 22:29 l5ef8n4b3r6e4h5oi47qm56c30
drwxr-xr-x 2 newuser newuser   4096 Oct 31 22:29 olt373rtb4bmgj8sri7nd494f0
drwxr-xr-x 3 newuser newuser   4096 Oct 31 22:29 v5s938sjhbj040ga3r1i31l808

Example is above.

  1. Don't want the ...rclone_chunk.001 filename to be unencrypted.
  2. Would prefer each subsequent chunk to have a different encrypted name.
  3. Not sure why there is *.tmp files too

What you will see and what will be actually stored in the remote are total opposites. With the suggested method of CloudRemote -> Crypt -> Chunker -> OS (User), what actually will be stored in the remote are encrypted chunks.

what I've shown above is the actual remote, as its mounted.

I'm encrypting the chunks, so don't expect to see above filenaming.

Sorry, I realize now I may have made a wrong assumption.

What you said you checked how the files were looked - what remote did you use to look at them?

I tested this with pcloud backend. This has now been mounted and the file naming is as listed above.

Ok, then I did not misunderstand I think.

The configuration you have shown does not match what you are describing.
I suspect you are thinking about it in reverse.

In the end you will be mounting the crypt (with your current config).
your crypt points to the chunker
and your chunker points to the pcloud remote.

That will result in
OS --> crypt --> chunker --> pcloud
that will make encrypted chunks, but since hte encryption is on the "inside" and the chunking is on the "outside" then you will see the chunking format in the raw files.
This is not "wrong", but it does not obscure the data as well - since you can see what chunks are grouped together. Ie. it provides no real obscuration of the size if that is your goal.

If you reverse the chunker and the crypt so..
you mount the chunker, and point it to your crypt
and point the crypt to pcloud
then you will get
OS --> chunker --> crypt --pcloud
then the files will be split first, then encrypted - resulting in raw files that are very obscured and can not be recognized as groups (unique filesnames due to salt - and no extensions revealing that a chunker was used).

Here is what that sort of setup would look like:

[pcloud]
type = pcloud
token = XXXXXX

[pcloudCrypt]
type = crypt
remote = pcloud
filename_encryption = standard
directory_name_encryption = true
password = XXX
password2 = XXX

[pcloudCryptChunk]
type = chunker
remote = pcloudCrypt
chunk_size = 100k
hash_type = md5

To use all three chained you would use pcloudCryptChunk to mount or interact with.
Note the physical order of the remote in the config foes not matter. I just rearranged them to make traffic easier to visualize. Traffic from OS to cloud moves from bottom to top. Then the traffic from could to OS move the other way, in reverse - top to bottom.

I hope I am not misteaching you. I have not extensively used the chunker myself yet, and I don't like it when darth disagrees with me, because I know he typically knows his stuff. Let me know if there are any special rules regarding chunker ordering that I am not aware of darth :slight_smile:

Excellent explanation. I don't think there should be any problems with the proposed solution.

works perfectly now.

In my mind it still makes more sense to crypt the chunked-remote, but it works perfectly to chunk the crypted-remote.

Just have to accept that my mind is wrong.

Thanks all

btw...

is there anyway to have:

  1. Chunks of 1M for everything
  2. This includes all smaller files getting joined together into that 1M
  3. Larger files would naturally get chunked down to 1M, with remaining excess of file perhaps less than 1M.

That way every single file would be 1M, except tail end of large files.

Its (2) that I don't think rclone can do at present, but I may be wrong?

Not yet unfortunately.

For that we will need some form of "merger" backend that combines smaller files transparently. This is something I have some plan of making a proposal for at some point because it would have massive benefits for performance on clouds where file-access-pr-second is limited (which is a lot of them).

I would also advise you generally that having too low chunks may be detrimental to performance. 1M chunks would be REALLY bad on something like a Gdrive that can initiate 2-3 new transfers pr second. Slightly less bad on a cloudservice that can do many more transfers - but still... not exactly ideal.

Since each small chunk will work as it's own transfers its going to need 1 API request each, and there will be a latency penalty in each new chunk that is requested. So downloading a 10GB file now become a massive operation involving 10.000 api calls (which you may pay for depending on provider) + 10.000 small pauses interspersed amongst your various TCP transfers. Lots of connections will mask that to some degree, but with such small chunks it would still have an impact.

Do note that crypted files are not the exact same size as the original file. They are slightly longer - and I think the padding is not a static size (I could be wrong), so this should make it at least non-trivial to accurately determine the size of the file (though it will be aproximately same'ish).

Maybe a "padder" bacakend would be the ideal solution for this spesific concern, where files could be appended with some junk data before encryption to appear as a certain minimum size (the same as size as your chunker chunks ideally). It would waste some storage space, but should not be hard to implement I think. Maybe you can go make an feature suggestion issue on something like that if you want:

Hi all,
I followed this discussion with great interest as I think this is a very common setup.
My setup is alike outlined by @thestigma.

However I made one add-on, which causes weird behaviour :

I like to create another subtarget at the remote [pcloud]
e.g:
rclone sync pcloud:SomeSubDir …

That gives me the chance to create remote crypted subdirs.
However when doing so, lots of errors are produced:

2 019-11-24 18:07:55 ERROR : nnnn.jpg.rclone_chunk.001..tmp_1574615275: Couldn't move: Copy call failed: 409
2019-11-24 18:07:55 ERROR : nnnn.jpg.rclone_chunk.001..tmp_1574615275: Failed to remove temporary chunk: 404
2019-11-24 18:07:55 ERROR : nnnn.jpg: Failed to copy: Copy call failed: 409

So what's wrong with this setup ?

I'm not sure what you are trying to do here.
A sync command needs both a source and a destination

Could you show me your full config? (please redact any senstive info clike clientID, clientsecret, token and any crypt keys).

Will check back after dinner :slight_smile:

Will check back after dinner
Dinner is more important than this technical stuff ! Enjoy it !

Anyway, my command above was incomplete, sorry:
Here is what I want to achieve:

rclone sync SomeSrcDirA pcloud:SomeSrcDirA …
rclone sync SomeSrcDirB pcloud:SomeSrcDirB …

SomeSrcDirA and ..B have nothing in common.

That allows me to use the same target for all operations into one remote target, but still distinguish them on the remote.

Thanks for taking care.
Gerry

Yes that is very reasonable and there is no reason that should not work.
So does this problem only occur when you use a subdirectory and not otherwise? If so that is strange and I suspect some sort of simple syntax problem.

Is pcloud bucket-based?
non-bucket based remotes (like for example Gdrive) use this format:
Gdrive:\directory\filename.txt
while bucket-based remotes (like Google Cloud) use this format:
Gcloud:bucket\directory\filename.txt

So I think that you may be either missing a ":" in the path (if it is not bucket based) or confusing a bucket with a folder (if it is bucket based). Unlike folders, buckets typically have the pre-made to me used - rather than generated and deleted on the fly by rclone.

Does that make any sense to you? I think pcloud may be bucket-based from what I read, but I have never used it myself so I am not certain.

I think that should work...

409 errors are conflict errors. I'm not sure what would cause those. Could it be that the intermediate file name nnnn.jpg.rclone_chunk.001..tmp_1574615275 is too long for pcloud?

Thanks Nick for your response.

Could it be that the intermediate file name nnnn.jpg.rclone_chunk.001..tmp_1574615275 is too long for pcloud?

I'm not using 'pcloud' as remote target, rather I use WEBDAV. (testing instance on WIN10 using a TOMCAT-server)

The paths are really very long when enrypted but the underlying NT- file system should be able to handle that.

I will check the OS and server limits again.

Interesting facts:

  • All uploads succeeded
  • When issueing the same command a second time, no problems.

I will test again.

Thanks again for your kind assistance !
Gerry

I wonder if it is a temporary glitch then...

Unfortunately not :slightly_frowning_face:
When cleaning remote and starting over from scratch, the same errors occur.

With ' issueing the same command a second time' I meant, another 'rclone sync ...'.
And even with changed local files, the 'sync' works without problems, then uploading without any problems. Very weird ?!

I'll try with a simple "filesystem copy" as the final target (instead of WEBDAV) when at home.

Ok, I think I got him !
It's a WEBDAV server issue. I changed to another WEBDAV- Server and here its working fine.

I have no idea what exaxtly the cause was as the TOMCAT-Server does not report any error in its log files. Maybe I have to adapt the log level or whatever.
My suspect is that the WEBDAV Properties are not fully or only limited implemented.

Anyway,
thank you for your support and a big thank for this wonderful tool.
Some donation via GITHUB will come.

Regard
Gerry

Glad you figured it out!

WebDAV isn't a wonderfully standardised protocol alas so there are often strangenesses when working with it.