Alternative to Crypt for File Path Blinding

There is this existing flag which would still copy the modified time but it makes rclone use the server uploaded time. However this flag only works with swift/s3 at the moment.

  --use-server-modtime                   Use server modified time instead of object metadata

The drive backend doesn't support this flag at the moment but it might be possible to support it

The API docs are here: https://developers.google.com/drive/api/v3/reference/files

I think maybe the modifiedByMeTime is the correct time as it isn't writable.

However rclone doesn't explicitly set the created date so I think that will default to the time of upload, which means you could use --drive-use-created-date (and set this in the config as use_created_date = true) and I think that should work.

Patching the drive remote worked for modtime, but I am considering going back to do the original version of this. Thinking separate overlay backend that can exist side by side with crypt. These thoughts are incomplete but what I jotted down so far.

Goals:

  • Store only the encrypted files on the remote
  • Leak as little information as possible with the encrypted files

Features:

  • Store a 'real' file table in a local database; all unencrypted metadata, hashes, etc. are kept here
  • Enable a many:many relationship between real and encrypted files
  • Pad the encrypted files to avoid leaking file size

Limitations:

  • Losing the database is fatal to the data. The data on the remote is, by design, useless without the database.

For the database, thinking old faithful sqlite. Considerations:

  • Embedded (for free): Avoid the logistics of a separate server process. Some embedded databases require commercial licenses, which rules them out here.
  • Parallel: Users may run multiple rclone processes at once. We want a database that will handle multiple processes reading and its own write locks. This eliminated boltdb.
  • Maintained Go API: Avoid the complexity of maintaining the API. This eliminates lmdb as there are several Go wrappers but none are maintained.
  • Graph Database: A graph database would be preferred for the file tree. Unfortunately this is the lowest priority, and I cannot think of a database that satisfies all of these. Open to ideas.

Bucketing small real files into a single encrypted file may be difficult:

  • The easiest solution would be to organize a the real files into groups first. With the design of rclone though, even when uploading multiple files, as far as I can tell, the backend is called on individual files.
  • As some cloud storage systems do not support appending, we cannot easily add on without repeatedly re-downloading the file so far. Appending also doesn't solve all problems associated with small files. It may improve remote listing performance by having less files. However, the timing of the writes still leaks file count and file size information.
  • It might be possible to redirect small writes to a cache and then flush them to the remote. This introduces a third layer (source, remote, cache) to worry about. This may be an area where it makes sense to reuse some of vfs or at least draw inspiration from it.
  • Unfortunately if we don't do some type of small file bucketing, we likely waste significant space on the padding of small files and leak significant information about the number of files.

Other design considerations:

  • It may be possible to extend the vfs directory cache to do all of this, but it's not immediately obvious to me how clean this would be. We'd need to support mapping of real files to encrypted files and permanence. I do not understand the vfs code well.
  • I have not thought of a good way to hide which encrypted files are parts of the same large real file. Any action that works on a single file at a time - upload or download - is likely to leak information. The only way to prevent this seems to be padding traffic itself, which would be very inefficient.
  • Encrypted files' name and path will be random. The real directory structure and encrypted directory structure are not related. However, we want to implement some controls on depth, items per directory, etc. as some remotes may impose limitations on these.

I've avoided sqlite as it is a C library. Linking to C libraries opens the CGO can of works which we've avoided so far...

The above sounds like a lot of work, alas.

It might be more effective to make an rclone backend for something like: https://github.com/cloudflare/utahfs

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.