Hello, I am working on a project involving a hardware security module that features an open source SDK. The idea is to modify the source code of RClone in order to automatically use the HSM to encrypt the files that are uploaded to the remotes (i.e. Google Drive).
I have never used RClone, I have just started learning about its features so I found the possibility of using the "crypt" remote.
My idea is to duplicate the code related to the "crypt" remote, creating a new remote that works like "crypt" but uses the HSM to provide stronger security (i.e. no need to use passwords for the remote because the keys are physically stored in the HSM).
Do you think this is something that can be done without a gigantic effort? My fear is that the source code could hide a lot of problems related to the development of a new remote, I don't think that working on the source code in \backend\crypt\ is enough.
Are you talking about using the HSM to actually encrypt the data, or just store the secrets in the HSM?
If just storing the secrets then you could write a wrapper that spoke to the HSM and set environment variables that contained the necessary data. rclone would then use those variables and you don't need to store them in the config file.
I want to use the HSM also to encrypt the files before uploading them to cloud services. The HSM already has built-in capabilities to encrypt and sign files, granting authentication and integrity. Therefore the entire "security" layer provided by the "crypt" remote would be replaced by the HSM.
What I want to achieve is basically the ability to encrypt files with the HSM automatically before uploading them to a remote (decryption when downloading is not required).
I thought that creating something similar to the "crypt" remote would have been the best option, but it is probably too complicated.
The best option could be to work directly on the experimental GUI adding some code there in order to create a layer that uses the HSM to encrypt the files before actually sending the command to the RClone. The same layer would then be responsible for decryption and other features I'd like to implement (i.e. using the HSM as a SSO toward cloud services instead of storing those credentials into RClone).
Since there's lots of different types of HSM, and they don't all support the same command sets, maybe a different approach would work.
A "program" remote that can call any external program to act as a pipe. This theoretical backend would be configured something like
type = program
path = /path/to/mycode
remote = RealBackend:
Now mycode can take stdin, process it (eg send to the HSM for encryption/decryption, depending on the command line parameter) and return the results on stdout.
This could be very flexible because it would allow for multiple integrations (eg someone in Amazon could call the KMS; some in Azure could call AzureVault; someone with a Gemalto HSM could call that; a FutureX HSM...)
An extremely dumb example (that just base64 encodes the content) would be:
case $1 in
upload) base64 ;;
download) base64 -d ;;
*) echo Unknown option >&2
Thanks. This "program" remote is a good idea; however, how does it interact with other remotes?
I mean basically do something (i.e. call an external program to encrypt the files specified by a command such as "rclone move") then actually perform the action requested by the command on the real remote (any remote supported by RClone, i.e. Google Drive, Dropbox, Mega, etc.).
The workflow should be something like this:
the user issues a command on the "program" remote, i.e. rclone copy /home/plaintext program_remote:encrypted_backup
the "program" remote encrypts the files belonging to the /home/plaintext
the "program" remote issues the copy command specified at step 1 on the actual remote (i.e. Google Drive)
I found this about "Alias" remotes https://rclone.org/alias/
So, if I understand correctly, the "program" remote is nothing else than an alias for any other remote I want to use (what you called in your post "remote = RealBackend:") but it allows me to execute, before performing any action on the real backend, any external program (what you called "path = /path/to/mycode"). Is this correct?
The backend remote [RealBackEnd] would handle the communication to the cloud storage (eg google drive, onedrive, S3) and, as previously discussed, that can be configured by environment variables setup by a call to the HSM before you run rclone.
Your script wouldn't talk to the cloud storage directly. It's just a filter that reads from from stdin and writes to stdout.
rclone opens file -> TST encrypts stdin using "mycode" -> RealBackEnd sends to cloudstorage
The example "base64" script I gave would be a complete functional example. You can see it doesn't call back to rclone at all. It's inserted into the middle of the data flow.
I think people usually only store keys in HSMs don't they? So a minimal implementation would be to use the HSM to store the crypt keys instead of storing them in the config file. That would be a significant upgrade in security.
If you want the HSM to do the encryption of the data you'll be bottlenecked by the HSM encryption speed. If you are OK with that then it would be relatively easy to swap out the block encryption (the data is encrypted in ~64k chunks) and send that via the HSM.
The filename encryption is different completely and I probably wouldn't touch that as it has some properties which are hard to come by...
Yes, the idea is to keep the keys stored in the HSM. However, I have already developed two libraries (for the HSM) that implement a Key Management System and an encryption layer which grants also integrity and authentication. These libraries work together, i.e. the encryption layer generates encrypted files with a special format that can be read by the KMS in order to identify which key has been used to encrypt the file, so that the user does not have to keep track of any low level detail.
I would like to use that encryption layer in order to encrypt files to be uploaded to a remote, i.e.: rclone copy sample.txt mydrive:encrypted
At this point, I would like to use the HSM to encrypt sample.txt using my libraries, then the ciphertext should be copied to the remote as a normal file. From the point of view of RClone that would be simply a binary file, no need to use the features of the "crypt" remote (i.e. name encryption or obfuscation, simply because the name of the encrypted file is already encrypted by my library).
I know this could be easily done using two commands on the terminal instead of one (a command to encrypt the file with the HSM and a command to upload the encrypted file) but I would like to develop an all-in-one solution so that I can use RClone normally and it automatically takes care of using the HSM to automatically provide the encryption towards a specific remote.
The example does not work because if I add the [TST] remote to the config file there is still the problem that there is not any "program" remote. However, I still don't understand, being a total noob about RClone, how the script is executed. Should it be executed by the "program" remote?
You implement Update to write data to your backend and then you pass the encrypted data to the Update from the wrapped backend. Same with Open except you read the data from the wrapped backend with Open, decrypt it and return it.
Remember this is just an idea; there's no actual remote of this type.
The idea is that if you did
rclone copy myfile TST:
then rclone would open myfile. It would then run mycode (because of the path in the config file), pass the data on stdin, read the results from stdout, and then pass that onto the [RealBackEnd] remote, similar to how crypt and chunker pass data to a backend remote.
Obviously this needs to be written, before it would work
@ncw I am experimenting with the GUI instead of writing a new backend. Basically, I am customizing the GUI so that it allows me to create a remote with the option to use my HSM to encrypt the files before uploading them. The GUI offers the possibility to choose which file you want to upload, so I simply need to retrieve the path of the selected file (in the GUI), pass it to my encryption library and then replace the name of the file with the name of the encrypted one before the GUI sends the request to RClone.
The problem I have is that I cannot retrieve the full path of the file inside the GUI (for security reasons the browser does not allow you to get the full path but only the file name). Do you have any idea about a possible workaround? Thanks
No, I am getting the file name directly from the GUI. I have downloaded the source code of the GUI and I am modifying it in order to use my HSM. If I remember correctly the source code of the GUI where I retrieve the file name is called uploadfilemodal.js or something like that (it is the source code of the new uploadfile feature).
Since I get the file name from the GUI source code, the browser does not allow me to retrieve the absolute path of the file for security reasons. I think that I could replace the part related to the drag&drop and to the select file dialog box with a simple call to a library implemented in another language (i.e. C#) that opens a select file dialog box which allows me to access to the absolute path of the file I want to upload, then I pass that value back to the GUI.
I don't think this is what I need. I want to be able to encrypt a file before it is uploaded for the first time to a remote, I would like to implement this into the GUI in order to keep the compatibility with standard versions of RClone. My idea was to modify the code of the Web-GUI in order to do something like this:
intercept the absolute path of the file that the user wants to upload, modifying the fileUploadHandler() method of /rclone-webui/src/views/Base/FileOperations/FileUploadModal.js
pass that absolute path to an external program that encrypts the file
the external program returns the absolute path of the encrypted file back to the GUI
the GUI executes the fileUploadHandler() function not using the original file (plain text) but using the encrypted file
the encrypted file is uploaded to the remote
The problem is that at stage 1 I can't retrieve the absolute path of the file, this is because of browser security I suppose. So the idea is, since the GUI is executed on the browser of the user in my case, to replace the file selection dialog box used by the GUI with a call to an external program that allows you to select the file and performs the steps listed above, simply returning the path of the encrypted file to the GUI. Since I am not an expert about high-level programming, I still need to figure out how to launch the external program from the browser of the user.
Notice that in my case RClone and the GUI run on the same machine, the approach explained above probably would not work with if the GUI is placed on a different machine.