This makes for deterministic encryption which is what we want - the same filename must encrypt to the same thing otherwise we can’t find it on the cloud storage system.
This means that
filenames with the same name will encrypt the same
filenames which start the same won’t have a common prefix
What I’m wondering is if, instead, we can generate a salt to encrypt the filename and store it in encrypted form along with the file (in the form <filename>.salt.bin for files and a .salt.bin file for directories). That way, we get stronger encryption of the file and folder names.
Of course, I presume this would lead to slower syncs if these files were only stored in the remote - I’m wondering if, as an option, we could store the salts in ~/.local/share/rclone/crypt/ or something so that we could read the salts without having to go to the remote.
There are probably a ton of other things to think through - this just came to my mind when I was reading through the crypt docs to get a better understand of how it works.
To additionally speed it up, you could also store the original names in an encrypted <filename>.name.bin file (which stores the path however you deal with it in the crypt code), which means you can always recover the encrypted name given what you have on the remote and also (mostly) guarantees unique encrypted names even when the unencrypted name is something super common (say, Documents).
After re-reading the way this works, it seems I misunderstood the way the filename encryption works
I’m more than happy to help code it, but I figured it might be valuable posting this idea here to see if people think it’s a good idea or if it’s overkill (we seem to have one vote in favor of the latter ).
I thought max filename limits apply to the encrypted version? Say you have dir1/dir2/file.ext. We generate a salt each for dir1, dir2, and file.ext and encrypt dir1saltdir1, dir2saltdir2, and file.extsaltfile. As I understand it, given that we’re base32-encoding the resulting string, it shouldn’t matter that we’re appending the salt to dir1 and the rest, right?
Also you want to create one extra file per real file? Drive uploads are already slow with small files. If I understand what your proposal is, you’d effectively double (if you’re doing this per directory that would triple or quadruple) the number of files which would considerably slow sync times. Also you then need to download 2 pieces of information for each file.
Maybe I’m not understanding what you’re proposing and what exactly you’re trying to solve.
Yeah…my initial proposal was based on a flawed understanding of the way filename encryption currently works. Given that things are encrypted with the key derived from the user’s password + IV, this probably isn’t an issue.
Huh, I wasn’t aware of that. I’ve only really used rclone with B2, which is fine for small uploads (still a larger overhead than with larger files, but speeds are good enough that it’s not really an issue).
All of this being said, it totally makes sense to disregard this
From a purists point of view, the determinstic (or ECB encryption) of file names isn’t ideal.
Does it weaken the cipher - absolutely not - the cipher is good against a known plaintext attack (like all modern ciphers) so if the attacker could guess a few filenames by doing analysis of the crypted names, then they’ve guessed a few filenames and nothing else.
We could generate a salt and store it with the filename. Other encrypted file systems use a hash of the directory path as the IV for the file name encryption. That works quite well. I decided not to do that with rclone as I wanted files to be moveable in heirarchies without breaking their file names. We could store one salt per directory - that would work.
There have been lots of ideas about how to make crypt more secure and if we go for one of those then I’d probably want to roll them up into something more like an index file per directory which would have
full file names (so they don’t get truncated)
hashes of the plaintext
name of the file in the directory
Then each file in the directory could just be stored as a UUID or something like that.
It might be better to allow multiple index files per directory to act as log, which rclone would garbage collect every now and again.
I’d probably want to embed the info within the file too so that rclone could regenerate the indexes if things went wrong.