Idea: Flexible fault‑tolerant storage virtual remote (Reed–Solomon based)

I’ve started work on a new virtual remote for rclone called raid3. It distributes data across 3 different remotes, so if one part is corrupted or a remote is unavailable/compromised, the data is still accessible.

While raid3 is simple and fast, I’d like to generalize the idea as suggested by core rclone experts using Reed–Solomon erasure coding for a more flexible “k‑of‑n” scheme. There was a related discussion back in 2019 about this: Creating PAR2 files for damage recovery?

Given that:

  • there is an excellent, high‑performance Reed–Solomon implementation in Go (github.com/klauspost/reedsolomon), and
  • rclone has a clean, extensible backend/virtual‑remote design,

a Reed–Solomon‑based virtual remote for distributed storage looks very feasible.


How Reed–Solomon works

For a single rclone “file”:

  1. Fragmentation: Split the file into k data shards.
  2. Expansion: Use Reed–Solomon to compute m parity shards, so total shards n = k + m.
  3. Distribution: Store each of the n shards on different underlying remotes.
  4. Reconstruction: Any k out of n shards are sufficient to reconstruct the original file.

This would generalize distributed and fault-tolerant storage.


Metadata / format questions

To make this robust and self‑contained, each file’s shards must carry enough information to be reconstructable even if the rclone config is lost or changed.

Core Reed–Solomon per‑file metadata (needed for decoding):

  • k – number of data shards
  • m – number of parity shards
  • padding info (how many bytes of padding in the last shard)

Potentially also algorithm options.

Core rclone‑level metadata to preserve per file:

  • mtime (original modification time)
  • hashes (e.g. original hash), if available/needed

Config vs on‑disk format

The virtual Reed-Solomon remote config defines:

  • which underlying remotes are used,
  • default for m and maybe other tuning parameters.

However, if the config is lost, the stored shards themselves should still be self‑describing enough for reconstruction.

This leads to this design question:

  • Should the metadata be embedded in each shard (header/footer inside the object)?
  • Or should we use a sidecar object per file for metadata (with some recovery plan if the sidecar is lost)?

Embedded metadata gives per‑shard self‑containment and atomicity on object stores; sidecar metadata keeps shards “clean” but introduces extra failure modes.


Looking for feedback

Comments, design suggestions, or pointers to prior art are very welcome.