Rclone, Amazon Cloud Drive encrypted remote, and hashes

Hello Nick and everyone,

I’ve been intensively testing rclone with ACD, and during the last 10 or so days I’ve uploaded 1.6TB of varied data using rclone over an encrypted remote on top of my ACD account, and it’s been working great.

I’ve seen some worrisome messages (like “Attempt 1/3 failed with 3 errors and: failed to authenticate decrypted block - bad password?”), but after the transfer finishes, I’m able to copy the data back from ACD, and when I check it with my (previously, locally calculated) MD5 checksum, it is perfectly OK – so rclone seems to be doing a great job recovering from ACD troubles.

In a single word, rclone is AWESOME, way better than all other alternatives I’ve checked so far. Many thanks to Nick for creating, releasing and maintaining it.

One thing I’ve been thinking about is how rclone sync (and copy, and check) can determine whether the local and the remote file are equal. As ACD doesn’t support ModTime, seems we’d be restricted to size and hash… but in an encrypted remote, rclone wouldn’t be able to do the remote checksum server-side directly as the hashes of the encrypted files would differ, so the only thing that could be checked would be the size.

Perhaps while copying files to the remote, rclone could ask the remote server for the hash of the (encrypted) file as soon as it’s uploaded and save it locally along with the local (unencrypted file) hash, and at the end of the transfer, upload those local/remote hashes to a special file (say, “.rclone_hashes”) in the same directory in the server. Then at the start of the next sync/copy/check, rclone could verify whether there’s such a file and if yes, download it first and use it to correlate remote (encrypted) hashes it could ask the remote server for, with the local hashes.

Just an idea… rclone is awesome as it is, but if it could be able to check the content (ie, using the hash) of local files against their copies in an encrypted remote, it would add a lot of peace of mind.

Cheers,

Durval.

:smile:

rclone does lots of retryiing.

There is an open issue about that error message: https://github.com/ncw/rclone/issues/677 - if you had a log with -v with that error message happening I’d be interested to see it.[quote=“durval, post:1, topic:51”]
One thing I’ve been thinking about is how rclone sync (and copy, and check) can determine whether the local and the remote file are equal. As ACD doesn’t support ModTime, seems we’d be restricted to size and hash… but in an encrypted remote, rclone wouldn’t be able to do the remote checksum server-side directly as the hashes of the encrypted files would differ, so the only thing that could be checked would be the size.
[/quote]

You are correct, yes.

I think this will be fixed as part of https://github.com/ncw/rclone/issues/637 where I’ll store the name, checksum, modtime etc in a metadata file on the remote which is a very similar idea to what you have proposed. That will make ACD support modtimes, long file names and hashes when using that encryption mode. Subscribe to that issue for updates!

Note that as part of encryption, rclone adds a very strong authenticator so you can know if the files are OK when you download them for definite.

Hello Nick,

[quote=“ncw, post:2, topic:51”]
rclone does lots of retryiing.
There is an open issue about that error message: (…) if you had a log with -v with that error message happening I’d be interested to see it.[/quote]

I’ve just posted on the github issue page you mentioned (not the logs yet, unfortunately), I willl post any follow ups in there.

Thanks for the confirmation.

[quote=“ncw, post:2, topic:51”]
I think this will be fixed as part of (…) where I’ll store the name, checksum, modtime etc in a metadata file on the remote which is a very similar idea to what you have proposed. That will make ACD support modtimes, long file names and hashes when using that encryption mode.[/quote]

That would be just great, thanks! Just gave it a cursory read, and there are some great ideas on there. I will also post any follow ups in the issue page.

Since a few days ago, I started receiving email notifications for everything that happens under https://github.com/ncw/rclone/issues (not sure exactly how I turned it on, but it sure helps keep up with rclone :wink:

Cheers,

Durval.

Hello,

Quoting myself as I just found something new:

According to the Amazon RESTful API Nodes page, it seems that there’s support for ModTime, quoting from there:

Files

Files are binary bits stored in Amazon Drive along with its metadata.

Resource Model

{
“id”: {string} “unique identifier of a file”, # string, max 50 characters
“name”: {string} “user friendly name of a file”, # string, max 256 characters
“kind”: “FILE”, # literal string, “FILE”
“version”: {long} metadata version of the file,
“modifiedDate”: {datetime} Last modified date (ISO8601 date with timezone offset),
“createdDate”: {datetime} First uploaded date (ISO8601 date with timezone offset),
“labels”: {string, string,…} List of Strings that are labeled to the file. Each label Max 256 characters. Max 10 labels.
“description”: {string} short description of the file. Max 500 characters.
“createdBy”: {string} Friendly name of Application Id which created the file (string),
“parents”: {String…} List of parent folder Ids.
“status”: {string} either “AVAILABLE”, “TRASH”, “PURGED” (string)
“tempLink”: {string} Pre authenticated link enables viewing the file content for limited times only; has to be specifically requested
}

Wouldn’t modifiedDate above fit the needs of rclone as to store (and later check) in the cloud the local ModTime for the file?

If I’m correct, should I open a github issue asking for that to be implemented in the ACD code?

Cheers,

Durval.

There is support for modification time, but unfortunately it isn’t settable - If you look down at the PATCH command you can see

Partial Update File Metadata

To update partial file Metadata like name. Clients can optionally do a conditional partial update by passing an ETag, which was received in a previous response, in If-Match header.
The allowed fields to updates are name, labels, description.

I just tried it now, and it ignores any attempt to set modifiedDate. Also you can’t pass it in when you upload the file either :frowning:

So I think the only alternative is to store the modTime in the properties: https://github.com/ncw/rclone/issues/371 which is possible, but a bit annoying that there isn’t a standard way of doing this.

Hi Nick,

This is really crazy… so why have a modifiedDate field after all, if it can’t be ever set?! :confused:

Thanks for the reference. Interesting things, these properties; if I understand it correctly, rclone (or any app) could read/write an arbitraly list of {key, value} pairs for each file? This could be a good way to also store the “encrypted:unencrypted” hashes and other metadata I mentioned on my comment on issue #637

I agree. The best place for ModTime would of course be modifiedDate, but ACD’s handling of this field makes no sense.

Cheers,

Durval.

1 Like