Mirror to remote using local metadata database

Hi, this is partially related to:

How to cache only metadata on Cache remote?

I'm evaluating rclone as a backup solution for diverse file types.

As a test I used rclone to backup a directory to Dropbox, with 1.6GB, about 60k files in 27k directories.
The initial upload took about 8h, which I expected to.
I then ran a new sync, without any changes and after 2h looking at the log, just checking the metadata from Dropbox, I gave up.

Given this, I would like to know if the following behaviour is already possible in rclone:

  1. On first sync, save metada in local database, for uploaded files.
  2. On subsequent syncs, check which operations must be done against local database, and then perform those operations (uploads and deletions) and update local database.

This way there would be no need to check metadata from the remote. I am assuming that the remote, in this case, is untouchable.

If at any point I would be unsure of the sanity of the remote then I would just do a full sync again.

If this is not possible I will have to implement it myself outside rclone (I'll be using rclone for the uploads anyway). Or, I might even try to have a "go" a it and look at rclone code to see if could implement it.

Many thanks in advance for your time,

Rui

I'm pretty sure the cache backend can do this but I don't remember how to configure it off hand - check the parameters to the cache backend here: https://rclone.org/cache/#standard-options

Thank you Nick.

I will of course try it with the "cache" solution with chunk size zero (or very small) but from the discussion I was not convinced it would implement the behaviour I want. Also, you say, in the documentation that the "cache" will be phased out in favour of VFS, which led me to think that a better, more reliable, alternative could be available.

I thought cache backend can not cache metadata only.

You can not use chunk_size = 0 it won't work. Tried that multiple times even with the latest version. You may try with smaller chunk size but I think it will defeat the purpose and of course server side copy won't work either.

I might be wrong about that - maybe I got confused with what I wish it would do :wink:

Will VFS be able to implement this in any way?

The VFS layer already caches metadata in RAM which for most purposes (except syncing) works extremely well.

Phase 2 of the VFS reworking will allow syncing to use the VFS layer to get some caching and the VFS layer will store metadata on disk.

So not yet, but soon!

1 Like

One thing you can do is do an incremental sync

rclone copy /path/to/local dropbox:dir --max-age 25h --no-traverse

Which will only copy files newer than 25h. You can run this once a day and be sure all the files are uploaded.

You should find an incremental like that is very quick.

Once a week you could so a full sync.

I am on the new VFS-beta. Can you provide an example how to cache meta data only on disk?

Yes, I already done it like that and it works.

Still... when you do the full sync, spending 4 or more hours, just checking metadata on the remote, when it could be avoided, is not a good solution. For this specific case (when you are sure the remote is sane) using a local database, keeping the metadata from the last sync, is the optimal solution.

Anyway, I'll be curious to see how is it going to work with VFS.

In the meanwhile, the cache remote, is working for my case.

Thanks for your help and comments.

1 Like

I haven't done phase 2 yet! So not yet - sorry!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.