Local Database for Tracking Chnages

jared · December 5, 2022, 4:21pm

i know this has been discussed before, but i figured i'd drop by and see if the interest level has changed in regards to using a local database to track file size/date differences as opposed to scanning multi-million file remotes at every sync. I know some of you are using rclone for some very serious workloads, and it would reduce sync times by hours for such tasks.

maybe it could even come with a --scan-remote flag for the occasional full remote scan to make sure the database is not leading to neglected files.

just curious what ya'll think. wonderful tool, regardless

jared

Animosity022 · December 5, 2022, 4:33pm

There is an issue that's been hanging around.

I think there is quite a bit of interest but not a lot of time to do it.

jared · December 5, 2022, 4:40pm

then with respect, please consider this my gentle stoking at the ambers of the matter nick mentioned that this is something that was started times ago but then mostly abandoned. i would've thought this would garner more interest, so i figured i would bring it up again to see if anything has changed.

Animosity022 · December 5, 2022, 4:45pm

Sure, I'd love it as well, but I don't program so I can't do much other than wait.

Options

Develop it
Sponsor it
Wait

I'm in the wait boat as it would be nice to have for me, but I don't develop and I don't think it's critical enough for me to sponsor.

jwink3101 · December 5, 2022, 6:10pm

I am working on a Python wrapper around rclone for backup that does the same kind of thing.

The problem is, rclone is stateless for most operations. This will be anything but stateless and can cause quite a few issues. Notably, you need to deal with cache-invalidation, which is notoriously difficult.

jared · December 5, 2022, 8:56pm

that's why you're a coder and i'm not

the reason i think this should be inherent in rclone's code is because the whole nature of cloud storage, i.e. centrally managed hardware that allots resources to a large customer base, is keeping very close track of I/O's. either to maximize availability or improve monetization. microsoft business, for example, allows you about 30,000 I/O's before throttling you and then allowing you 5,000 I/O's every 5 to 15 minutes. (roughly) that is extremely stingy. (especially since their consumer onedrive allows 100k's of I/O's before throttling.) i believe other cloud services charge you for I/O's. this is likely to become more common in the future.

if rclone was "self-sufficient" with keeping track of changes, it could result in syncing a multi-million file remote with a total of 20 I/O requests. it would be magical

system · February 3, 2023, 8:56pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.