It would allow everyone on these forums who actually want to make solid reliable backups to do so using archival google cloud storage or amazon glacier or other archival deep cold storage solutions. Plex fans wouldn’t be helped but I am not one of them.
The idea here is that one remote would store all the metadata needed to answer lsl and directory listings. This remote would be hot storage but cheap because it would contain zero data, it would merely contain all the filenames and dates and sizes (bonus points for hashes but that wouldn’t be needed.)
This would let someone use rclone to backup to their cold storage without waking the cold storage. The second remote would instead serve rclone the needed directory listing data about the cold storage for the backup process to add to cold storage without making any extra api calls into the cold storage and running up a high cold storage bill.
Yes after a long time the second remote’s metadata could become false and cause trouble, but that’s the sort of thing a person could manually refresh once a year.
This feature is what company’s like google and amazon WANT us to use. But as far as I can tell no one provides this service or feature.
Does anyone have any ideas about how to do any of what I suggested with the current rclone? slash how hard would this be to be made into a future feature? Sort by average file size is a feature I recommended for ncdu but that was trivial and what I am asking for now sounds perhaps infeasible?
EDIT: A lot of the information on this page Amazon S3 is all about avoiding unneeded api calls, but, if another remote could store the metadata instead then generating these extra api’s wouldn’t be costly. It would also possibly be faster. The downside would be the risk of the secondary remote lying about it’s metadata, but ideally the user would manually re-update such a thing perhaps once a year or never depending on their needs.
It looks like amazon is trying to offer a product that does what I request, although it would still be nice if rclone handled it entirely?
S3 Intelligent-Tiering:
* S3 Intelligent-Tiering can store objects smaller than 128 KB, but auto-tiering has a minimum eligible object size of 128 KB. These smaller objects will not be monitored and will always be charged at the Frequent Access tier rates, with no monitoring and automation charge. For each object archived to the Archive Access tier or Deep Archive Access tier in S3 Intelligent-Tiering, Amazon S3 uses 8 KB of storage for the name of the object and other metadata (billed at S3 Standard storage rates) and 32 KB of storage for index and related metadata (billed at S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage rates).
As far as I can tell S3 Intelligent-Tiering specifically does everything I asked? storing metadata as hot data and leaving cold data cold during metadata api checks.
I’d still prefer rclone to handle all of this rather than amazon so I can manage and manipulate my metadata entirely freely.
From what I can tell some of amazon’s metadata per access charge and monitoring fees are unnecessarily high? But I might be mathing it poorly and amazon offers intelligent monitoring that most people do not need when they could just manually designate which classification of storage they want their data to have. When rclone could just do that work on any remote that allows a high amount of api requests like googledrive or dropbox or whomever.
TLDR: I want rclone to replace the intelligent ai style monitoring S3 Intelligent-Tiering charges for and uses, and instead just manually designate all my backups as deep archive and all my metadata as hot using a service that will easily accept my metadata storage size and a large number of api requests to that metadata. Does any of this make sense?
TLTLDR: I think rclone could easily provide a better service than amazon’s S3 Intelligent-Tiering does, but I am clueless as to how to make that so.
While I am at it, do any other services work like amazon’s S3 Intelligent-Tiering? Google cloud storage has a lot of similar buzzwords but I do not think it actually stores metadata hot and data data cold. Instead google cloud storage seems to simply move infrequently used data from hot to cold. Which is NOT AT ALL what I want. In order to make proper backups I need my data cold but that data’s metadata HOT. Again HOT metadata should be cheap because metadata is 8kilobytes per file instead of potentially gigabytes per file.
Everything under management insights is work similar to what rclone already does only amazon s3 bills for every little api call made. If a person could instead give control over the metadata entirely to rclone the management insight fees would be almost eliminated. A person could just store all their amazon s3 deep archive data’s metadata on for example googledrive. They could then do all their sync comparison api’s via googledrive unbilled and merely add more cold data to s3 when required without using S3 Intelligent-Tiering’s intelligence services at all.
note: You have to click management insights to see what I am talking about.
TLDR: Does any of what I am asking make sense? I hope I explained myself well enough but I fear I did not.
EDIT: To clarify I am sure amazon s3 api call charges are completely fair and reasonable, but a single user error on my end could generate millions of extra api calls, and if that error is copied forward for months or years without me noticing billions or more extra api calls. Not to mention my inept skill could lead to me trying to make hot data api calls and instead making cold data api calls cranking the pricing from reasonable to astronomical. If rclone had total control of my metadata and I put that metadata on a remote that did not charge per api, instead of a sudden bill spike all I’d notice was slow performance or error sleep request feedback, which would be the best of both worlds. I want to have my cake and eat it too
@kapitainsky Do you happen to know how well rustic works on windows? It is listed as experimental but was easy enough to install via scoop. Has anyone else had much success with rustic?
This is my exact desire:
And our very own local forum hero **kapitainsky gave this suggestion to solve the desire.
**
I wonder if this works better with true glacier or something like s3 intelligent tiering, but I guess I should ask all my followup questions on the rustic forum, unless someone replies here commenting about rustic……
what you are describing is a separation of storage into:
a metadata service (indexes, listings, object map)
a data service (bulk content)
This pattern is, as far as I know, common in distributed file systems and object storage systems.
In rclone terms, you probably would need to implement a new virtual remote/backend that:
Exposes a normal filesystem view to rclone users.
Stores all metadata (directory tree, object names, sizes, hashes, timestamps, maybe custom attributes) on one remote.
Stores the actual object data on another remote.
Automatically keeps both in sync whenever files are created, modified, renamed, or deleted via this virtual remote.
As far as I am aware, the main challenge would be ensuring that neither the “data” remote nor the “metadata” remote is modified directly by users or other tools. And that all changes go through the virtual backend so metadata and data never drift out of sync.