Rclone has been banned from Amazon Drive

Have you tried zfs already?

Straw man. You were talking about the PBs of data from different customers in a data center.

Yes, if you’re using the same encryption key for all the data you may end up with non-random blocks that are possible to de-duplicate - depending on salting and other implementation details. In short: If the component that wants to de-duplicate is in charge of encryption, it may be as good as unencrypted data.

In the scenario we’re discussing here this isn’t relevant, because encryption is done on client side with different keys.

1 Like

Nope. Exactly the same thing. Block level dedup.
And by the way I thought that an it manager/project manager would now more about infrastructure

I am 100 percent with @jnfingerle on the matter. He is right.

The fact that you think he does not know his stuff, just proves how limited your point of view is. All you do is throw commercial products and results known to you as arguments.

I am out. Everything relevant has been said, even if you do not approve.

1 Like

Deduplication would work for Amazon versus customer encrypted data only for the duplicated data that customers are uploading (the same thing) twice, encrypted with the same key (and if the encryption doesn’t have any per-block or per-file seed like encfs can be configured to have).

For example if a customer would backup with rclone+the same encryption keys two of his (family’s) computers which have mostly the same thing (let’s say 2TBs of family pictures) - the encrypted data will have a lot of identical files which amazon could deduplicate.

Or if one customer would
rclone copy my_stuff crypt:march2017
and then
rclone copy my_stuff crypt:april2017
(just because he has unlimited space while my_stuff is mostly unchanged) then yes amazon would deduplicate those too.

But that’s all. There would be no deduplication at all across customers.

3 Likes

One last time, more for the sake of others reading this thread: You cannot beat science: If the blocks are basically random, you cannot deduplicate them. Well, you can, you just cannot save space with that. Google “information theory” for more on that matter.

“block level dedup” aren’t magical words: It only works if the blocks aren’t essentially random.

And by the way: I don’t like ad hominem attacks, therefore I will not answer any more of your messages.

1 Like

So I was just on the phone with the German Amazon customer support. According to them, they stopped supporting rclone, and won’t support it in the future. Bad news, but I don’t know if I should believe the hotline 100%, since the girl did not really know what she was talking about at first and had to ask a few people…

In case someone should have the problem to transfer data away from ACD to another cloud as Google, HubiC (!!!) or some others, I’ve found a web based service that could help us: MultCloud (https://www.multcloud.com)

It supports: Google, Dropbox, OneDrive, FTP, Amazon, MEGA, Box, HubiC, SugarSunc, pCloud, WebDav, Yandex, Adrive, and some others.

The free version seems limited to 10Mbps and with 2TB transfer free. The paid one is $7,99/month with unlimited (???) transfer.

I’m currently using this to move away from ACD.

What xupetas is trying to explain is this… Amazon operates at the BLOCK level, NOT file level. Blocks can only be but so big. If you store enough data even completely randomized files WILL have duplicate blocks because there’s a finite measurable number of 1 and 0s that can be used to fill that block. Given exabytes of data to work with duplicates WILL appear. Pretend for a moment the block is say 1000 1s and 0s, in a custom file system they might even be able to do that, now do the math on how many times you can fill that before something IS duplicated. Get it? With enough data it will happen if the block size isn’t massive - encrypted or not.

What’s the status update on getting rclone back to amazon?

Yes that’s correct, if you store enough data and you have small enough block sizes at some point there would be duplicates. So lets do the math on that, and let’s use @xupetas numbers, 4MB block size and 15EB data total, dividing that means there are 3.75 trillion blocks of data stored.

Now if we’re talking about encrypted data then the contents of those is basically random (assuming the encryption algorithm used is not broken). So the question is if you pick 3.75 trillion random blocks with 4mb size, how many duplicates are you’re getting on average, the answer to that is 0. Literally 0.

Also note that in order to (reasonably efficiently) deduplicate data you need to store extra metadata for blocks as well. This will take orders of magnitude more space than any gains in a miraculous case of actually finding duplicate random blocks.

Also note that making the block size smaller changes nothing about this, to give some size comparison, the estimated number of atoms in the universe is 10^80, you can store more unique values than that in 34 BYTES, not MB or KB, just 34 bytes. The number of different unique blocks of data of 8KB blocks of 128KB blocks (what many dedup solutions use) is insanely huge, like unimaginably huge. There will be no duplicates with random data, which is what encrypted data is unless the encryption is broken.

3 Likes

The thing is it doesn’t matter how large or small the block is, it can be 8-bit or even one bit (!) you can’t represent the x-bit blocks in x-1 bits unless there is something particular about those blocks that makes half of them not appear at all.
This is why random or encrypted data in a file doesn’t compress. You do have a lot of duplicated 8-bit blocks inside, isn’t it? But it takes 8-bits to represent each…

1 Like

Could you go into more details @ncw? What encrypted secrets are the problem here? Correct me, but arent the secrets in the config file actually?

The config file contains access tokens which allow your rclone instance to access your Amazon Drive for you, however in order for an application like rclone to request those access tokens from Amazon in the first place and send you to that Amazon “Application XY would like to access your account” page, a developer needs to register an application with Amazon first and then Amazon gives them a client_id and a client_secret. Look into OAuth for more info if interested.

Those credentials were hardcoded into the rclone code and visible to everyone, however as the name suggests they are supposed to be secret and only held by the devs of the registered application. This way other people can’t go and just use the keys to make their own apps with it and amazon can tell for certain what application is responsible for requests.

Since they were public it allowed everyone to just use those and make their own. It appears Amazon doesn’t like that.

Also note that unlike almost all other cloud storage providers, you currently can’t sign up and register to make a new ACD app, signup for that is closed and only people that already had credentials previously can still make ACD apps. So it seems Amazon is cracking down on any way anyone could use the cloud Drive API in a way they don’t approve of.

Only real workaround I saw suggested to fix it is a website hosted by the rclone devs that handles the process of getting the access token for you without telling you the client_secret, however that would require you to trust the devs and they could access all your data if they wanted. That is also only possible if amazon decides to give them credentials again in the first place, which seems questionable since they really don’t seem to like 3rd party apps anymore :frowning:

Edit: also that workaround would of course still allow anyone to do whatever they want with the API on any account they have an access token for

3 Likes

Ah, thank you @hennes11 for the detailed answer. I only thought about the fact of the user authentication.

Only real workaround I saw suggested to fix it is a website hosted by the rclone devs that handles the process of getting the access token for you without telling you the client_secret, however that would require you to trust the devs and they could access all your data if they wanted.

The authentication of the user to amazon can be done without submitting your password to the devs. If rclone would get an authentication service it could forward you to the amazon login in the same way it is done now.

So wouldn’t it be sufficient to for example extract the rclone authorize into an web service owned by rclone / @ncw?

Having an external web service (owned by rclone) may be one solution but it is a VERY bad one :

  • it is a major security risk (what if this server is hacked, what if the owner of the server decide to log every single users’ login/password, …)
  • it is a single point of failure : as soon as this server is down (domain name not paid anymore on renewal, and so on), then you won’t be able to use ACD with rclone

your arguments are not valid.

since username+password are entered ALWAYS at a amazon service. the 3rd party service just gets a token which allows it access to your ACD data. You can always revoke the token in your “apps” settings on ACD.

yes it is, but that how it works. goes the same for every other tool accessing ACD.

2 Likes

Ok, at second thought, you’re right indeed.

But what about the second point (point of failure - no more auth service, whatever the reason is, no more new auth.

1 Like

@SR-G:

like @Philip_Konighofer said, the login would be done directly to the amazon site, not to the service hosted by rclone. Like I said in my previous post:

If rclone would get an authentication service it could forward you to the amazon login in the same way it is done now.


But what about the second point (point of failure - no more auth service, whatever the reason is, no more new auth.

There is no other way to implement that without make the client_secret public. So if you want use ACD, this would be the restriction to get comfortable with.

you have to see it from a developers perspective. having this server gives the ability to block out anyone requesting tokens on behalf of rclone and misusing them.

single point of failure may be true. but if rclone is not maintained anymore and such a service is ended, then it is just a question of time, until it gets out-of-date on the API and stops working anyway. since this is the way amazon designed the system and requires it to be used this way, there is no way around this. no auth service, no rclone on acd, period. :slight_smile:

1 Like