The problem I am trying to solve is that not all backends support versioning and rclone is hard to use with immutable storage. This is backend agnostic, generally approachable, and doesn't ever delete or move files.
A "versioning" backend could look as follows:
Every file that gets transfered gets a suffix added like: <filename>.<timestamp> where <timestamp> is something like the epoch nanosecond time (maybe encoded for compactness?).
When a new version of a file is written, it just gets the new timestamp
When a file is deleted, it gets an empty file named <filename>.<timestamp>.d
If the file is written again, the .d remains but a newer one is added.
When a file is moved, it gets a deleted (.d) and a new one is created. Server-side copy can still be done but server-side move must fall back as if the remote didn't support it.
With all of that, when files are listed, they are grouped by <filename> and the latest timestamp is returned. If the latest is .d, it is ignored in the listing.
In addition to versioning every file, this also means you can use immutable buckets without any loss of functionality. (and avoid things like Wasabi's 90 day policy).
I am not sure how hard this would be to implement. It doesn't seem too difficult and if I ever get a chance to get proficient enough at Go, I may give it a shot. What do others think of this?
Additional Note: You'd probably want a backend feature to purge where you specify a date and it deletes all versions older than that date as long as a newer one also exists. And/or maybe one to delete all but the last N versions.
I am well aware of the --suffix flag which moves the to-be-overwritten file to have the specified suffix. But what I am proposing goes way beyond that.
You'd still want to support --suffix and --backup-dir but you'd note in the docs that these are obviated by the underlying remote.
I'd probably use the naming scheme used by s3/b2 for versions for compatibility.
Makes sense though the file names get really long. An 8 or 9 byte base32 or base64 encoded integer is much more compact at the cost of human readability.
Probably the only fly in the ointment is that opening a file will require a directory listing which is potentially slow.
Hmm. Interesting point but I have to wonder how often is a file read without already listing the directory? I guess some of this depends on where the abstraction inside of the rclone code happens, but the cases I can think of are:
copy(to), move(to) and sync (when just one file is listed).
Opening on a mount/serve
In regular sync, the listings happen. In serve, I thought listings were cached (but again, depends on where in the abstraction). I guess you'd have to document this and maybe also made things like --no-check-dest and --no-traverse do nothing.
Either way, it is more complicated than I originally figured. I don't imagine having time to learn enough to do it any time soon but if someone else is interested, I'd love to help and test. Otherwise, I'll put it on my todo list (after, you know, learning go. Alas, programming of nearly all sorts is just a hobby and not work so I am limited in time)
I like the idea, but think everything making directory listings could become significantly slower, that is sync, copy, move, mount, serve, ...
Worst case example: A folder with 1000 files can typically be listed in 1 API call. If each file exists in 20 versions then it will require 20 API calls each retuning the typical upper limit of 1000 items - that is 20 times slower.
Best case example: A folder with only 50 files in 20 versions can still be listed in a single API call.
So it is important to have a good and well integrated purge algorithm, probably using a combination of days and number of versions like OneDrive, Google Drive etc.
The problem with this kind of backend in Python is that you can’t batch up transfers since the names will always change.
Every transfer has to be its own rclone copyto src:file.ext dst:file.ext.<date> which means that rclone has to do a lot of redundant work. Though you could speed it up with some flags since, by definition, the destination won’t yet exist
As for restic, that is certainly a valid option and has some real benefits over an rclone native versioning backend. But versions are more than just for backups. And whole files, rather than blocks with a database, have some real advantages (and disadvantages). Also, you can version any rclone remote as opposed to a local backup pushed.
Also, can restic now work on immutable storage backends? I seem to recall it being a work in progress but I may be incorrect