How to properly create GSuite app/project for rclone?

linuxAdmin · February 5, 2020, 9:47pm

Hey,

I am trying to properly create the GSuite project/app for using with rclone.

What I mean by properly:

I need to access a single gsuite users team drive/shared drive with rclone, using a service account. I dont want to create a absolute security disaster by allowing domain wide delegation, so I need to make the project a internal marketplace app for my gsuite, and then allow the app to access that users data.

Ive done all this.

Ive created the project, made it into the internal marketplace, added the app to the user I am looking to use rclone on. Created my rclone config, by adding the marketplace client id ad the rclone config client id, created a API Key and made it as the client secret for rclone. Ive also added the appropriate google drive permissions to my oauth popup. (But whats weird is that when I look at the user that I accepted it on, it says the app can access basic account information, and nothing about the GDrive permissions I set, and I have uninstalled and reinstalled the app on the user multiple times.)

I used impersonate option on config, to impersonate this user.

I am getting the client_unauthorized error when trying to access via RClone(Got it when I set the team drive on the config, ive tried copying files, it seems ok, but the files dont appear in GDrive, so im still getting it).

How do I properly set this up, what am I missing?

I am using the latest rclone for amd64 on Ubuntu.

EDIT: I ran the command wrong, when I run a copy command to the drive properly, it now fails 3 times with client_unauthorized error(The same one ive been getting all along ofcourse), the error description is: "Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested".

EDIT (A day later): I gave up trying to use the stupid service accounts and just made oauth tokens and a project on each user, and then verified it via webbrowser. works. Case closed.

thestigma · February 6, 2020, 8:10pm

I am wiling to help you get this sorted.

First of all, you can use service accounts within and without the organization.
If you do not wish to have a service account that has domain-wide delegation (ie. access to all accounts on the domain) then you can still make a general service account that can be invited to share data.

To do this you can use any Google account (not necessarily the one you need to access the data on).
the best way to think of a service account (I will call this an SA from now) is to think of it as a normal Google account. It has an email. It has an inbox. It even has Gdrive area. There is very little difference from any normal account. If you invite that SA email (found inside the SA file if you open with any text editor) then it will have access - just like a normal user.

This makes it very easy to share a team-drive (AKA "shared drive") to an SA, because you can simply invite the email-address and give it read/write permissions.
non-teamdrives (AKA personal drives) can also be shared but must use the "share" function for this. It's a bit more cumbersome but works fine as long as all the data that must be controlled is inside the shared area.

You should not over-focus on the impersonation-feature. This is mainly for managing large organizations where you need the SA to perform maintenance across the organization. Sometimes tha can be very useful - but if you just want to share data between account A and B - it is not necessary at all (the rclone documentaton perhaps overemphasizes this which may lead to confsion).

An SA has a few advantages over Oauth. It will never time-out. It can be very easily transferred from one system to another (because it's just a file). The downside is that it is arguably more vulnerable to data-breaches. If someone gets hold of that file they will have control of that account. No need to re-authenticate anything.
Aside from that - Oauth is perfectly fine too..
As long as you have a personal API key that is the only thing that matters from a performance perspective, and that will be the case no matter if you use a personal Oauth or an SA.

Let me know if I can clarify anything here - I know it can be confusing, but it is really isn't once you wrap your head around it

linuxAdmin · February 6, 2020, 9:09pm

I kind of closed this up already, as I just started using the oauth client id and secret.

But I now have another question. Can I use the exact same rclone config (by running rclone config command and setting everything the same)on two users in the same system and access the same files, and not have a problem? Im not going to actually access the exact same files, but files in the same folder.

Ive been trying to google this, but I havent found anything, I remember scrolling through this forum, and I remember someone had a problem with this kind of setup? Or maybe im remembering wrong.

Thanks for your help dude

linuxAdmin · February 6, 2020, 9:10pm

I replied to your answer, my reply doesnt look like it was a reply, maybe I did something wrong lol. Im just making sure I pressed the right reply button, I did with this message.

EDIT: Yes I didnt press the right button

thestigma · February 6, 2020, 9:48pm

It looks like a normal reply when I read it now at least... (and I was notified of it being posted)

Yes, it should work fine to copy the setup for two remotes. At least if you go through the setup and authorization step twice, once for each. (I am not sure if copy/pasting causes problems but it might cause confusion on the login-token so I would generally advice authorizing twice.)

The only "problem" with this is that all interactions to the server will be performed under the same user-account. This means a shared 10transactions/second API limit and a 750Gupload/10TBdownload limit. Also the "hidden" 2-3 files opened pr. second limit (specifically for Google Drive).
It does not sound like this is necessarily a problem for you.

If you need each setup to work as it's own user (separate access controls and quotas) then you probably need to look into using service accounts (feel free to ask if needed).

To sum up - the main reason you would want to use service-accounts is if you need special access-permissions for each, or your demands on the system require you to have separate API and upload/download quotas. SA's are also generally much easier to port to other systems as everything in baked into a single authorization file. Oauth requires manual confirmation. However this also makes service-account much easier to steal in a data-breach in a less-than-secure system. Anyone who gets their hands on that file has the authorization for your files built-in nad require no additional verification.

If in doubt - use Oauth. If you think you might benefit from a service-account, then just ask
I use SA's myself, but I am an "advanced" user who knows the upsides and downsides to it.

linuxAdmin · February 7, 2020, 1:47pm

I have another question now, I know the chunk size should be more if youre trasnferring big files etc.

But the chunk size can manually be changed for each transfer right? Yes from what ive seen.

So is there a way to manually set the optimal chunk size each time you copy a file? Or actually I dont need the way to do it, but whats the calculation?

Whats math calculation for the optimal file size for transfers based on the file size?

Because the file sizes im copying to the single remote are going to be ranging from 6 TB to like 1mb for single file.

Also I have essentially unlimited ram for this purpose (right? I dont actually know how much ram this will use, do you know?). I have 64GB installed, and I can have this process use around 20-24GB.

Thanks for your help dude, if you dont know ill start a new thread but then ill have to wait a long time before people discover it.

linuxAdmin · February 7, 2020, 1:59pm

Also you said that you saw the reply I made before sending the other reply? So it indeed notified you of my first reply, even thought the reply doesnt look like a reply to you?

Im just making sure!

thestigma · February 7, 2020, 8:37pm

It is always optimal to use the largest possible chunk-size if you don't care about how much RAM you use. Larger upload chunks means the upload pipe can be utilized fully for a bigger percentage of the transfer. It will never be faster to use a small chunk-size.

But here are a few rules of thumb and useful information:

There is little to be gained past 128M (from my testing). 256M is slightly faster, but more than that I can not measure any real difference anymore - at least not on 2-300Mbit. Perhaps there may be an appreciable difference for 1Gbit, but there is no reason to go overboard here...
The for each doubling of chunk-size the benefits get smaller, so I see very little reason to use more than 128-256M maximum
The total RAM use for chunking will be (chunk size * number of transfers). For example (128M * the default 4 transfers = 512M).
Only relatively large files benefit from a large chunk-size. If you are transferring a 5MB files it will not matter if your chunk-size is 8M or 128M because the file is just 5MB so it will fit 100% regardless (rclone will only actually use 5MB RAM to chunk that file then no matter the setting).
The only good reason to use small chunks (other than having limited RAM) is if you have a very unstable connection, because if an error occurs a whole chunk has to be re-transmitted. This is rarely a problem these days however...

So in short - just set the chunk-size to 128M or 256M and there is no more to think about. That is as optimal as we can make it

If you want to understand why chunk-size for uploads make the transfer faster I will explain the basics in case you are interrested...

Most internet traffic (including rclone) uses the TCP protocol. This works by starting the transfer slow, and then doubling the amount of data sent in each burst until it finds the maximum sustainable speed (the point at which it just barely starts to get congestion errors). This means TCP transfers can take a few seconds to actually use the upload speed 100% because this auto-detection needs to happen each time.

When you us a small chunk-size (for example the default 8MB) then each chunk will be transferred as it's own unit. That means that transferring 8GB would need 1000 chunks, and each of those will be inefficient for the first few seconds. This adds up when there are many chunks and causes less than ideal usage of the network. On a fast connection you probably never each manage to ramp up to 100% speed before the 8MB is already transferred and it starts the same process again.

By having a chunk-size of for example 512M we only need 16 chunks, so that reduces this inefficiency to almost nothing (1,6%) compared to 8MB. But 128M would only be 6,4% of the inefficiency so... that's a lot either. As I said there are diminishing gains. Even 64M is pretty good - and way more efficient (noticeably so) than the default 8MB. I think the reason rclone uses such a low value by default is because it is designed to not use a lot of resources by default. Not all small VMs for example have that much RAM to spare - and people would complain that "rclone randomly crashes" because they were not aware of these settings.

If you look at your bandwidth graph (in taskmanager for example) you will see that with 1 transfer using small chunks, the bandwidth will look like "shark teeth" - going up and down and up again. With larger chunks it will stay at 100% most of the time (with some small dips here and there). That means a faster total transfer. This is easy to test for yourself and see the results visually

linuxAdmin · February 7, 2020, 9:27pm

Okay, thanks for the informational answer. I have a 1gbit connection, but this is a server so I cant use it all for the backup uploading.

I set it to 64M initially, ill bump it up to 256M. Im currently getting 12MBs while uploading a 300mb file, thats what I used for testing. But the deal is im encrypting the files with the rclone crypt and also with openssl smime, so that may be actually whats limiting me here, not the chunk size.

I dont know whether thats how slow the encryption with smime and crypt should be, but what I can see that its only using one thread for the process, so thats probably the max for it I guess. I didnt find any information on how to use multiple cores with smime or crypt, but i dont want to know because I shouldnt be wasting my time implementing stuff like multicore encryption. Maybe later.

Thanks for ur help. I got rclone totally sorted now for my production server, which was my most important thing.

I did the exact same configuration and exact same scripts for my home server, but I actually couldnt get the same results. It doesnt work. Which is crazy, I have no idea how its possible. Its the exact same command(capitalization will be wrong and probably some command flag names, im on phone),this:

Tar -cz --preserve-permissions /dir/to/backup | openssl smime encrypt tar with public key -stream | rclone rcat -P Backup-Drive:/backup/dir

I use this exact same command on the production server(it uses a different google acc and client id and stuff) to upload to gdrive, and as i said it works perfect, but when i run it on my homeserver thats setup identically but with different client ids etc. This happens:

It says transferring file, like it works, but the file is 100% transferred from the beginnng. And it uploads only 64M! The chunk size I set! And after that happens, it stops uploading, but the command output doesnt go away. It gets stuck, and the upload speed drops to 0.00001mb as time passes, and its not uploading anymore but stuck.

This is really weird and i dont know what the heck could cause it, let me know if you have any idea. Ill probably make another post about that issue here in the forum later.

thestigma · February 8, 2020, 1:58am

rclone crypt is so fast that I very much doubt it will bottleneck a 1Gbit connection. Not unless you have a very old/weak CPU. Not likely to be an issue unless you crypt locally to a very fast SSD - and even then I suspect a decent CPU would handle it. I have not done exact testing on this though - but it has just never been a noteworthy problem for me to look into.

I can not speak for smime. I have no idea what that uses. Check your CPU utilization maybe. crypt's bottleneck will usually be CPU if anything. Some crypts can certainly be much heavier and it also depends on what hardware-support your CPU has for certain encoding standards and if a certain encryption scheme uses that.

Note that on a fast connection you can't really expect to max out your speed on a single connection. Not unless you are transfering inside the Google network at least (where the limit appears to be about 42MB/sec pr connection - for eaxmple on a GCP VM). Thankfully, multiple connections basically scale almost linearly. You probably need to utilize 3-5 conncetions to fully saturate 1Gbit. From my experience at least. 12-14MB/sec average sounds pretty familiar to me on a single connection. (rclone default number of connections is 4. Can be increased with --transfers INT, though Gdrive tends to not benefit from much more than 4-5 because it has inherent limits on opening 2-3 new files pr second. Note that this is not the limit on max number of open files at a time, just new file-accesses pr second). Thus it should be noted also that many very small files will never get great speeds just so you are aware. That's not a bandwidth issue but a server-limitation set by Google.

The command seems sensible enough, but since I have no experience with how smime operates this is tough for me to answer. Exact same version of rclone and smime? What happens if you cut away the smime pat from the command? Does that face the same issue? Kind of sounds like something is going wrong in the handover between the programs in the pipe.
Maybe @ncw can spot something wrong here that I missed. my Linux knowledge is only so-so.

Since this seems to break somewhere equal to the chunk-size I suppose you could also try testing what happens if you effectively disable the chunking (to produce a single uninterrupted stream and thus possible never trigger the problem).
The default setting is
upload_cutoff = 8M
but if you set that to a very large number like 999G it will never chunk.
alternatgively, override it with the flag
--drive-upload-cutoff 999G

Note that this tends to transfer slower than chunking. Not horribly slow, but noticeably so. I am still not exactly sure why that is (possibly some ingest caching system on Google's end? But that is a topic for another time ). The upside is it uses no RAM. While I wouldn't necessarily recommend this as final solution - it may help you identify where exactly the problem is happening...

linuxAdmin · February 8, 2020, 8:01am

Ive pinpointed the problem to tar, it doesnt do what its supposed to do. I have no idea why, and this is happening on a VM I created too.

It seems that tar cz doesnt work on the path because its used as a NFS fileserver, that path. I dont see any other reason for it. And it keeps happening even in a different machine with the NFS mounted there.

So I dont know what I should do about that lol. Ill have to try to figure it out.

system · May 28, 2020, 10:57am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.