Encryption and file size

Hello,
I would like to ask you to consider file size obfuscation as a feature of the encryption options that rclone already offers.
I think it is a very important feature because from the size of specific files it could be easier to break the other encryption features.
Thanks for your incredible work.

1 Like

Hi n3urale,

I am not a crypto expert, but I do know that it is very difficult to implement secure encryption correct. It just takes a small unnoticeable mistake to make it relatively easy to break for somebody with the will and resources.

I therefore don’t think that file size obfuscation will make it harder to break, it may instead increase complexity and thereby introduce an implementation bug that will make it easier to break.

To me, the rclone encryption is a quick and simple privacy protection from the prying eyes of some storage providers and the average hacker. I would not rely on rclone encryption if access to the content was attractive to somebody determined and resourceful.

There could off course be situations where you consider the size of your files a privacy issue. If this is your situation then I guess you can combine encryption with the chunker remote (Note it is beta).

Hi, thanks for your reply.
I am pretty sure that crypto is hard to implement flawlessly and so the simpler are the features the easier is the implementation, but I think that
1 a public cloud provider can easily find a set of files, between all those that are saved in its systems by different users, that have exactly the same size
2 given an encrypted file with a known size, getting the unecrypted version shared by someone else using the size as a clue can greatly increase the chance to match a crypted file with an unecrypted file.
From this to rebuild the secret keys that will allow the cloud provider to open all the encrypted files i think the step is short. I hope that someone can negate this statement.
Thanks for the suggestion of using chunker, I did not know this feature and it can be an improvement step,but I am afraid is not the solution.

This sounds like somebody determined and resourceful looking for attractive/unwanted content, so this statement applies (with and without filesize obfuscation):

…and I would go to great lengths to keep these data away from that somebody.

I doubt it makes a real-life difference, do you have a reference to support this?

a public cloud provider usually is a deep pocket company with much technical internal expertees, so i think this covers the "resourceful" part.
the fact that people use rclone to store big volumes of file (e.g. movies) on their cloud space i think can make them also "determined".
what i describe is referenced here and is know as known plaintext attack, i do not know if the encryption algorith used by rclone is susceptible to it (today), but i think it's a good measure to make it less likely that it can be used in any case.

Exactly, that is my point.

That is why I wouldn’t use them for content where it may be attractive (=profit) to them to use their resources for search and decryption.

From the docs: “File content encryption is performed using NaCl SecretBox, based on XSalsa20 cipher and Poly1305 for integrity.”

Do you have a reference to succesful known-plain-text attack on XSalsa20? or a calculation of the resources needed to do so?

no, i do not, but i like a "better safe than sorry" approach, file size obfuscation is not unprecedented and even if today an attack of this kind on the standards used by rclone were not likely, in 5 years... who knows, so better safe than sorry.

maybe i lost something... but what is the alternative? whoever sells you space can have an angle in understanding what you are saving on their infrastructure: to save space, to use your info, be forced by local law to prevent pedophilia (see recent apple "we scan all your pictures on apple devices to find out who is a pedophile)... having a file whose content is known saved on this can always happen, why leave the opportunity?

Using a cloud service that has no interest in your content. Remember, that they are companies needing to earn more than they spend. They would never use for $100 CPU power to find information that has a value below $200 to them. As an example: Information that can be used for advertising has more value to Google than to Backblaze.

... or you can use your own server(s).

Personally, I prefer to stay on good terms with my service providers and to leave my data and activities somewhat accessible to law enforcement (with a subpoena) to avoid suspicion of being a paedophilia, drug dealer, terrorist etc.

If you are like most of us, then the by far easiest way to get hold of you unencrypted content is to compromise your phone or computer (through the browser).

i do not agree with any part of your answer: i do not have resources or expertise to have a bat cave with my own servers, and it's not (only) a matter of security but more a matter to prevent data lost (redundancy, backup, monitoring)... if you are not the ceo of a cloud provider you do not know what is "of value" for those companies, and even if you are today... as usual in 5 years you can change strategy or the company change ceo.

i prefer to comply with contracts, be a good payer and that my confidential informations stay confidential (even pictures of my family and/or of my ass), and if there is a reasonable suspicion that i am doing anyhting illegal the law enforcement authorities can come to me and force/ask access and i can decide if i want to provide it or not, according to my rights and not allow my data to be accessed without my knowledge just because the piece of software i used to cypher has a bug or is missing an easily imaginable security feature.

defending the devices i use to run rclone is another matter, that does not concern rclone and/or its features, giving the best chance to the user that the cloud provider cannot uncypher the data you store is a matter that concerns rclone.

No modern crypto system is susceptible to a known plaintext attack. That goes for rclone too :slight_smile:

You can read about the cryptanalysis of XSalasa20 on wikipedia: Salsa20 - Wikipedia

But to summarise - XSalasa20 doesn't have any significant breaks

1 Like

hi,

double encrypt using two remotes.
for example, aws s3 and some of its clones, support client side encryption.

[remote]
type = s3
provider = Wasabi
access_key_id = xxx
secret_access_key = xxx
endpoint = s3.us-east-2.wasabisys.com
s3_sse_customer_algorithm = AES256
sse_customer_key = 11111111111111111111

[crypt]
type = crypt
remote = remote:bucket
filename_encryption = standard
directory_name_encryption = true
password = 
password2 = 

thanks, that is good to known.
in any case i hope you will accept the feature, even if probably not the most urgent.
thanks for your work

sorry, i did not understand what you are suggesting. from the conf you posted it looks like a standard encryption to me.

using a config like that a file would be encrypted twice.

  1. rclone crypts the file, client-side, using rclone encryption.
  2. rclone crypts that crypted file, client-side, using the aws s3 go module, which uses aes-256
  3. rclone uploads to wasabi.
    to be accurate, crypts/uploads chunks of a file, not the entire file.

notice from the remote for wasabi, s3 clone, the file is client-side encrypted using aes-256

FWIW, some silly numbers...

As @ncw mentioned, there's no known plain text attack on a properly executed encryption. Further it's believed that symmetric encryption is quantum resistant. So someone wanting to decrypt data would have to do a brute force attack.

I'm gonna choose AES here, 'cos that gives really fast numbers 'cos of the AES-NI instruction set. On an Opteron chip it only takes around 30 cycles to do AES, so a 3Ghz clock would give 10^8 AES operations per second. Let's round up to a billion (10^9).

Let's also say we have a million cores all dedicated to this. That's 10^15 AES operations per second! Let's round up; 2^50 operations per second.

Now AES 256 has 256 bit keys. The best known attack on AES is Grover's algorithm which can reduce the problem space to the square root. So that means we've got 2^128 operations.

2^128/2^50 = 2^78. That's how many seconds it'd take these million cores to try and break the average AES256 encryption. That works out at 9,583,696,565,945,500 years.

Really it doesn't matter how many machines you throw at it; 10 million, 100 million, a million million cores... we're still in trillions of years.

No CSP is gonna do that.

Attacks on AES are generally attacks on the key generation process. And typically that may rely on a human entering a password into a key derivation function. If your password is "hello" then that's the attack point :slight_smile:

So I wouldn't worry about CSPs breaking the encryption on client side encrypted data. It's just not that important to them!

(This sort of logic is also why governments want to ban or backdoor end-to-end encryption chat systems like Signal or WhatsApp; you just can't brute force it, but if you design in a weakness then you don't need to!)

3 Likes

thanks for the reply, i do not know wasabi and probably i am missing something: you are saying that double encryption make (even) less likely for someone to deduce the encryption key even if the plain text for a file is known or that this double encryption in some way changes the size (or some of the sizes of all the chunks) of the file saved on the cloud?
in any case double encryption looks like an increased effort on the client that can lead to some performance issues.

thanks for the numbers, this is really conforting that known plain text cannot greatly help in breaking the keys.
in any way, i still think that file size obfuscation is a good feature to implement to increase confidentiality: if a single file with known size can only hint the content, a directory with many files of known size can clearly point to the content nature (e.g.: tv series, each file is an episode).
i really hope you will consider it.

for myself, i use rclone crypt as is.
you seems to want an additional layer of protection, using free space, which rclone does not support.
so i was sharing a way for that additional layer.

wasabi is a s3 rclone, and like aws s3, can encrypt files using aes-256, client-side.

in my testing, double encryption does not lead to any noticeable performance issues.
most cpu have hardware support for aes-256.

another case of double encryption is my use of keepass

  1. keepass is encrypted using aes-256
  2. stored on a bitlocker volume
  3. backed up to a rclone crypt.

in the end, as sweh pointed out, the weak link is me, the human/monkey.

I just wanted to add that if you use the chunker remote, which I think is a good idea if you are concerned about this, make sure to chunk BEFORE you encrypt. Otherwise, your file names will be <encrypted>.rclone_chunk.<chunk> and that is easily backed out. If you chunk before encrypting, they will be scrambled.

You also probably want to use a shorter name as the file names will get super long with what I suggested.