Concept: how to rclone sync new files without relying on timestamps, without flooding local storage

stealth.mode · September 11, 2021, 7:42am

Our approach of synchronizing new files from cloud to local may be of interest for others. So I'll share it.

Background:

Our application stores chunks of encrypted data als files in different cloud storages. Basic principles are:

do encrypt quantum computer safe
split encrypted data into chunks
store chunks in different legal entities (different providers in different countries).

File names are composed of hash values and a file is never modified as required by object store systems.

Problem:

On clients we want new files to be downloaded and processed. Only the result of the processing and only meta data of the files should be store somewhere locally. Cloud storages are used to sync data between multiple clients.

In the beginning we used rclone sync with the --max-age property. But file creation time may be something different than file arrival time in the cloud. And how to handle wrong time settings on the clients? And how to prevent customers from changing the time while synching files? As always relying on timestamps is no fun.

So we had the following idea:

how about syncing everything from remote to local, process the synced data and evacuate the processed files then? So we only have empty files locally with the same file names as the ones in the cloud. We guide rclone not to take care about modification time nor about md5 values, just file names.

We listen to file system events, so our Flutter code get's informed about new arrivals and about local (empty) files which get deleted because their counterpart in the cloud got deleted. We have lots of empty files which we use as an inventory (they are actually not empty but keep meta data but do not hold heavy payload).

What we achieved:

We relay heavily on rclone sync. Because files only get created and deleted and never modified we advice rclone to ignore all time consuming processing criteria. rclone sync then just pulls files which are missing locally. And removes local files which are missing in the cloud. And when a Dido clients clock is set to 6th of June 2013 (wich is an important date) rclone sync still works as expected.

stealth.mode · September 12, 2021, 4:15am

To be more precise, we use:
--ignore-existing
which makes processing that fast. Furthermore, because we call rclone sync often, we also use:

--fast-list
--retries 1
--low-level-retries 5

asdffdsa · September 12, 2021, 5:54pm

you see to have a complex use case, so not sure this is helpful

use rclone check and feed that output to rclone copy

stealth.mode · September 14, 2021, 9:41am

Thank you for your idea of using rclone check. But using empty documents with rclone sync and making rclone ignore all existing stuff just fist perfectly to our use case. That's why I shared the approach because it is not an obvious solution.

system · November 14, 2021, 5:41am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.