How to mirror two teamdrives in real time?

random404 · March 21, 2020, 12:05pm

Is it possible to mirror two teamdrives in real time, without excessive API usage if both have large amount of files?

But there shouldn't be lots of files to mirror, but there would be lots of checks I assume

Animosity022 · March 21, 2020, 12:36pm

You could schedule a sync between the two. Rclone doesn't do 'real time' sync as that's not its intended use case.

That would sync A->B or you can do B->A. There is no option in rclone to do a bidirectional sync.

random404 · March 21, 2020, 12:48pm

But would be there issues if I ran the rclone sync every minute? How can I keep API hits to the minimum?

Animosity022 · March 21, 2020, 1:02pm

Depends on how many files and directories really as that really is the driver for API hits. You get quite a lot of API hits per day.

random404 · March 21, 2020, 1:04pm

Does rclone sync uses the download quota? I'm having lots of download quota issues with individual files, and I worry running rclone sync all the time counts as "downloading" the file when it checks or something

My tds have about 30k files

Animosity022 · March 21, 2020, 1:07pm

There are multiple quotas from Google.

You have a daily quota for API hits which is 1 billion.
You have a daily upload quota of 750GB.
You have a daily download quota of 10TB.

There are unpublished limits on downloading the same files and probably many more unpublishedthings.

If you copy something from A -> B, you use download on A and upload quota on B and all those API hits are against you API quota for the day (which you'll pretty much never hit). sync/copy/mount commands all use API calls to do things.

random404 · March 21, 2020, 1:11pm

I hit the download limit of files every day. When rclone checks a file before deciding to sync or not, does it count against the download limit of the file ?

Animosity022 · March 21, 2020, 1:38pm

You only count against the download and upload daily quotas when you download and upload a file. If you see it copying a file, you are using download and/or upload quota.

If you are not copying a file, you are not using it.

thestigma · March 21, 2020, 10:59pm

No. rclone will request listings (think large lists of text with basic name/modtime/size/hash data for all the files) and compare these.

And regarding how heavy syncing is, it's maybe going to use some of your API quota for that 100-second timeframe, but otherwise it's not heavy to run. Especially if you use --fast-list (on compatible cloud-provider remotes) as this will both speed it up and use far less API to do it.

Syncing every 1 minute seems a little excessive to me - but it would probably work just fine.
You could also use --tpslimit 5 or 3 or something like that to only allocate up to half/third your API quota to be used by the sync at any given time in order to not crowd out your other concurrent usage.

Otherwise if that timing is so critical to you, then you might consider looking at the new multiwrite union and set it up in a mirror-mode to copy all outgoing data and changes to both places (at the cost of a double upload compared to just server-side syncing from remote-to-remote). multiwrite union will soon be available in general beta, but it is already available as a branch-beta.

random404 · March 22, 2020, 2:04am

What if I use -max-age=1d would that help with API hits ? but then what happens if I modify an file or folder older than that...it will still sync ?

thestigma · March 22, 2020, 3:03am

No, won't matter.
Obviously rclone has to do the listing to even know what age the file is, and it will always skip any files that don't need updating. The minimum cost of doing a sync is to list all files on both sides. You have to do that just to check if you even need to make any changes at all.

It's really not a big problem to sync often - but I'd probably give it like 5-10 minutes at least. If you actually need faster mirroring than this you should probably make use of a custom setup with multiwrite union rather than trying to get "realtime" syncing with excessively frequent syncs - because that just isn't a thing in cloud.

random404 · March 22, 2020, 6:40am

Thanks for the information. I have read the documentation but I can't understand how I'd use that to achieve what want.

Can I do this with the union:

Upstream remote that is always up-date
Backup remotes that will be a little behind but I want to keep as in sync as possible
If upstream remote is not available or if files don't work because of quotas then applications will be able to transparently use the backup remote

All this merged in a single folder like a rclone mount of a single remote we use today?

I only need to read from that folder so don't need to worry about any writes happening if that makes it easier. I was planning on trying to do something similar with mergerfs but was not sure I'd get point 3 working

Thanks

thestigma · March 22, 2020, 2:34pm

This part you could do with anything - union or not. All that is needed is a periodic sync.

For this part you will need a union (for the fail-over aspect) - but I don't think there exists any sort of failover based on just quota restrictions hmmm. In fact I strongly doubt it as multiwrite union was created to do what mergerFS on Linux does - and that program isn't made to communicate with clouds in any way. The default policy of "ff" should return your answer from whichever remote answers you first though, so it might work regardless but I have not tested via quota errors spesifically. I would give it a test - and if it doesn't work then ask @Max-Sum (maintainer) about if this is something that can be tweaked in the "ff" policy to only regard successful replies (this may already be how it works for all I know though).

Multiwrite union should be able to mimic all the core functionality of mergerFS. Max-sum basically implemented the same policy systems using mergerFS as his model.

Please note that you won't find the documentation for the new multiwrite union on the main page yet (since it is not yet in release yet - it will be in the general beta very soon). You can go here to see docs:

Here is even more up-to-date info straight from the codebase:

github.com

rclone/rclone/blob/pr-3782-union/docs/content/union.md

---
title: "Union"
description: "Remote Unification"
date: "2020-01-25"
---

<i class="fa fa-link"></i> Union
-----------------------------------------

The `union` remote provides a unification similar to UnionFS using other remotes.

Paths may be as deep as required or a local path, 
eg `remote:directory/subdirectory` or `/directory/subdirectory`.

During the initial setup with `rclone config` you will specify the upstream
remotes as a space separated list. The upstream remotes can either be a local paths or other remotes.

Attribute `:ro` and `:nc` can be attach to the end of path to tag the remote as **read only** or **no create**,
eg `remote:directory/subdirectory:ro` or `remote:directory/subdirectory:nc`.

This file has been truncated. show original

This should be the latest compiled brach-beta for it:
https://beta.rclone.org/branch/v1.51.0-113-g98ad80be-pr-3782-union-beta/

random404 · March 23, 2020, 11:44am

I already use rclone beta, so I already have rclone v1.51.0-126-g45b63e2d-beta

So I fixed my 1 and 2 issue changing my rclone mount to this remote:

[union]
type = union
remotes = 1: 2: 3:

This way I don't have to crazy sync the two remotes, since that way my rclone mount will always be updated, even if 2: or 3: is behind

My questions is... in what order each remote will be read ? And what happens if file in remote 1 is unavailable for any reason, like quotas, but same file is fine on remote 2: ?

thestigma · March 23, 2020, 4:15pm

That depends on the policy you select.
The defaults which are "ff" (first found) for most reading and searching will ask all 3 and simply read from whichever drive responded first. This would be fine for a mirrored setup as all files will be identical anyway.

But for example if you had a different setup where the union-members contained different data then you would probably want to use a search policy of "newest" or at least "epff".

Why would 2 and 3 be behind? If you traditionally sync then they will indeed be a little behind, but if you use this then you can use a create-policy that mirrors. Ie. when you upload a new file (or change something) that data goes out to all 3 at the same time. In a mirrored setup like this - which is the closest you'll get to a RAID-1 like setup on a cloud system, there is no reason to sync the drives (although this does make it so you have to upload all data in triplicate effectively)

Please note as I said that this is not the default setting for CREATE. You will probably want to set this to the "all" policy. The other two (action and search) should be ok for you to leave default.

I really highly do recommend reading up on the docs here to understand the policies to get the setup you want. It is one thing to take my recommendations and run with them, but it is always better to have some understanding of why it works like it works - and what your other options would be.
I will be happy to clarify as best I can if something is difficult to understand.

random404 · March 24, 2020, 4:58am

I don't write anything to the mount on this server. It's read only so this is working fine for me now

I still prefer to upload to a single remote, then do server-side copies after. More efficient that way too.

Is there a policy to make it search a remote first, then go to another... What you said of searching on 3 remotes every time seems a bit excessive for api quotas ?

system · May 24, 2020, 12:58am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.