Develop custom "crypt" remote to be used with a hardware security module

matteof93 · August 13, 2020, 2:40pm

Hello, I am working on a project involving a hardware security module that features an open source SDK. The idea is to modify the source code of RClone in order to automatically use the HSM to encrypt the files that are uploaded to the remotes (i.e. Google Drive).

I have never used RClone, I have just started learning about its features so I found the possibility of using the "crypt" remote.

My idea is to duplicate the code related to the "crypt" remote, creating a new remote that works like "crypt" but uses the HSM to provide stronger security (i.e. no need to use passwords for the remote because the keys are physically stored in the HSM).

Do you think this is something that can be done without a gigantic effort? My fear is that the source code could hide a lot of problems related to the development of a new remote, I don't think that working on the source code in \backend\crypt\ is enough.

sweh · August 13, 2020, 3:44pm

Are you talking about using the HSM to actually encrypt the data, or just store the secrets in the HSM?

If just storing the secrets then you could write a wrapper that spoke to the HSM and set environment variables that contained the necessary data. rclone would then use those variables and you don't need to store them in the config file.

See https://rclone.org/docs/#config-file for simple examples.

matteof93 · August 13, 2020, 4:06pm

I want to use the HSM also to encrypt the files before uploading them to cloud services. The HSM already has built-in capabilities to encrypt and sign files, granting authentication and integrity. Therefore the entire "security" layer provided by the "crypt" remote would be replaced by the HSM.

What I want to achieve is basically the ability to encrypt files with the HSM automatically before uploading them to a remote (decryption when downloading is not required).
I thought that creating something similar to the "crypt" remote would have been the best option, but it is probably too complicated.

The best option could be to work directly on the experimental GUI adding some code there in order to create a layer that uses the HSM to encrypt the files before actually sending the command to the RClone. The same layer would then be responsible for decryption and other features I'd like to implement (i.e. using the HSM as a SSO toward cloud services instead of storing those credentials into RClone).

sweh · August 13, 2020, 4:37pm

Since there's lots of different types of HSM, and they don't all support the same command sets, maybe a different approach would work.

A "program" remote that can call any external program to act as a pipe. This theoretical backend would be configured something like

[TST]
type = program
path = /path/to/mycode
remote = RealBackend:

Now mycode can take stdin, process it (eg send to the HSM for encryption/decryption, depending on the command line parameter) and return the results on stdout.

This could be very flexible because it would allow for multiple integrations (eg someone in Amazon could call the KMS; some in Azure could call AzureVault; someone with a Gemalto HSM could call that; a FutureX HSM...)

An extremely dumb example (that just base64 encodes the content) would be:

#!/bin/sh
case $1 in
    upload) base64 ;;
  download) base64 -d ;;
         *) echo Unknown option >&2
esac

This backend should be a lot simpler to write

Unless @ncw can think of any objections!

matteof93 · August 14, 2020, 8:33am

Thanks. This "program" remote is a good idea; however, how does it interact with other remotes?
I mean basically do something (i.e. call an external program to encrypt the files specified by a command such as "rclone move") then actually perform the action requested by the command on the real remote (any remote supported by RClone, i.e. Google Drive, Dropbox, Mega, etc.).

The workflow should be something like this:

the user issues a command on the "program" remote, i.e. rclone copy /home/plaintext program_remote:encrypted_backup
the "program" remote encrypts the files belonging to the /home/plaintext
the "program" remote issues the copy command specified at step 1 on the actual remote (i.e. Google Drive)

I found this about "Alias" remotes https://rclone.org/alias/
So, if I understand correctly, the "program" remote is nothing else than an alias for any other remote I want to use (what you called in your post "remote = RealBackend:") but it allows me to execute, before performing any action on the real backend, any external program (what you called "path = /path/to/mycode"). Is this correct?

sweh · August 14, 2020, 10:35am

The backend remote [RealBackEnd] would handle the communication to the cloud storage (eg google drive, onedrive, S3) and, as previously discussed, that can be configured by environment variables setup by a call to the HSM before you run rclone.

Your script wouldn't talk to the cloud storage directly. It's just a filter that reads from from stdin and writes to stdout.

rclone opens file -> TST encrypts stdin using "mycode" -> RealBackEnd sends to cloudstorage

The example "base64" script I gave would be a complete functional example. You can see it doesn't call back to rclone at all. It's inserted into the middle of the data flow.

ncw · August 14, 2020, 1:38pm

A whole remote is a lot of work.

I think people usually only store keys in HSMs don't they? So a minimal implementation would be to use the HSM to store the crypt keys instead of storing them in the config file. That would be a significant upgrade in security.

If you want the HSM to do the encryption of the data you'll be bottlenecked by the HSM encryption speed. If you are OK with that then it would be relatively easy to swap out the block encryption (the data is encrypted in ~64k chunks) and send that via the HSM.

The filename encryption is different completely and I probably wouldn't touch that as it has some properties which are hard to come by...

matteof93 · August 14, 2020, 2:14pm

Yes, the idea is to keep the keys stored in the HSM. However, I have already developed two libraries (for the HSM) that implement a Key Management System and an encryption layer which grants also integrity and authentication. These libraries work together, i.e. the encryption layer generates encrypted files with a special format that can be read by the KMS in order to identify which key has been used to encrypt the file, so that the user does not have to keep track of any low level detail.

I would like to use that encryption layer in order to encrypt files to be uploaded to a remote, i.e.:
rclone copy sample.txt mydrive:encrypted

At this point, I would like to use the HSM to encrypt sample.txt using my libraries, then the ciphertext should be copied to the remote as a normal file. From the point of view of RClone that would be simply a binary file, no need to use the features of the "crypt" remote (i.e. name encryption or obfuscation, simply because the name of the encrypted file is already encrypted by my library).

I know this could be easily done using two commands on the terminal instead of one (a command to encrypt the file with the HSM and a command to upload the encrypted file) but I would like to develop an all-in-one solution so that I can use RClone normally and it automatically takes care of using the HSM to automatically provide the encryption towards a specific remote.

ncw · August 14, 2020, 2:21pm

It sounds like you might be better off with your own backend then.

What you want to do is make a wrapping backend (like crypt) which passes through most of the operations except for Put/Update/Open which are the read and write primitives.

If you weren't bothered about all the rclone tests passing that would be quite simple.

You can keep it out of tree and run it as a plugin (See the CONTRIBUTORS.md for details).

matteof93 · August 14, 2020, 2:40pm

sweh:

The backend remote [RealBackEnd] would handle the communication to the cloud storage (eg google drive, onedrive, S3) and, as previously discussed, that can be configured by environment variables setup by a call to the HSM before you run rclone.

Your script wouldn't talk to the cloud storage directly. It's just a filter that reads from from stdin and writes to stdout.
rclone opens file -> TST encrypts stdin using "mycode" -> RealBackEnd sends to cloudstorage
The example "base64" script I gave would be a complete functional example. You can see it doesn't call back to rclone at all. It's inserted into the middle of the data flow.

The example does not work because if I add the [TST] remote to the config file there is still the problem that there is not any "program" remote. However, I still don't understand, being a total noob about RClone, how the script is executed. Should it be executed by the "program" remote?

matteof93 · August 14, 2020, 2:45pm

Thanks, I will try with my own backend.

Put/Update/Open are used to read/write data from/to the wrapped backend (instead of the same functions specified by the wrapped backend) ?

ncw · August 14, 2020, 2:55pm

You implement Update to write data to your backend and then you pass the encrypted data to the Update from the wrapped backend. Same with Open except you read the data from the wrapped backend with Open, decrypt it and return it.

sweh · August 14, 2020, 3:30pm

Remember this is just an idea; there's no actual remote of this type.

The idea is that if you did

rclone copy myfile TST:

then rclone would open myfile. It would then run mycode (because of the path in the config file), pass the data on stdin, read the results from stdout, and then pass that onto the [RealBackEnd] remote, similar to how crypt and chunker pass data to a backend remote.

Obviously this needs to be written, before it would work

gelsas · August 15, 2020, 10:58am

A feature like this would be really great to have!

ncw · August 16, 2020, 9:43am

You could almost certainly store the password to the rclone config using --password-command proving there was a command line utility for getting secrets or of the HSM.

matteof93 · September 2, 2020, 2:13pm

@ncw I am experimenting with the GUI instead of writing a new backend. Basically, I am customizing the GUI so that it allows me to create a remote with the option to use my HSM to encrypt the files before uploading them. The GUI offers the possibility to choose which file you want to upload, so I simply need to retrieve the path of the selected file (in the GUI), pass it to my encryption library and then replace the name of the file with the name of the encrypted one before the GUI sends the request to RClone.

The problem I have is that I cannot retrieve the full path of the file inside the GUI (for security reasons the browser does not allow you to get the full path but only the file name). Do you have any idea about a possible workaround? Thanks

ncw · September 2, 2020, 5:13pm

Where are you getting the file name from? From the rclone rc interface? If it is a local file then you should be able to get the full path fairly easily.

matteof93 · September 3, 2020, 6:15am

No, I am getting the file name directly from the GUI. I have downloaded the source code of the GUI and I am modifying it in order to use my HSM. If I remember correctly the source code of the GUI where I retrieve the file name is called uploadfilemodal.js or something like that (it is the source code of the new uploadfile feature).

Since I get the file name from the GUI source code, the browser does not allow me to retrieve the absolute path of the file for security reasons. I think that I could replace the part related to the drag&drop and to the select file dialog box with a simple call to a library implemented in another language (i.e. C#) that opens a select file dialog box which allows me to access to the absolute path of the file I want to upload, then I pass that value back to the GUI.

ncw · September 3, 2020, 4:37pm

Rclone returns the full path of the file to the gui. It uses operations/list - if you follow the link there to the lsjson page you'll see the format.

matteof93 · September 4, 2020, 7:35am

I don't think this is what I need. I want to be able to encrypt a file before it is uploaded for the first time to a remote, I would like to implement this into the GUI in order to keep the compatibility with standard versions of RClone. My idea was to modify the code of the Web-GUI in order to do something like this:

intercept the absolute path of the file that the user wants to upload, modifying the fileUploadHandler() method of /rclone-webui/src/views/Base/FileOperations/FileUploadModal.js
pass that absolute path to an external program that encrypts the file
the external program returns the absolute path of the encrypted file back to the GUI
the GUI executes the fileUploadHandler() function not using the original file (plain text) but using the encrypted file
the encrypted file is uploaded to the remote

The problem is that at stage 1 I can't retrieve the absolute path of the file, this is because of browser security I suppose. So the idea is, since the GUI is executed on the browser of the user in my case, to replace the file selection dialog box used by the GUI with a call to an external program that allows you to select the file and performs the steps listed above, simply returning the path of the encrypted file to the GUI. Since I am not an expert about high-level programming, I still need to figure out how to launch the external program from the browser of the user.

Notice that in my case RClone and the GUI run on the same machine, the approach explained above probably would not work with if the GUI is placed on a different machine.