Erasure coding raid3 development update

Following up on the initial RFC, here is a status update on the raid3 virtual backend.

Status: Alpha / experimental. Core operations (Put, Get, Delete, List) are implemented and covered by tests. Degraded-mode reads, automatic heal, and full rebuild paths work. The backend can be exercised with local filesystem, MinIO (S3), and SFTP remotes. The implementation is available in the PR “Add erasure coding raid3 disaster-tolerant storage backend”.

Overview

raid3 is an erasure-coding scheme with a fixed 2+1 layout: two data particles and one parity particle per object. It is the first rclone virtual remote that splits each object into particles and distributes them in parallel across multiple underlying remotes, providing disaster-tolerant storage over distributed backends. Any two of three remotes are sufficient to serve reads; the third can be reconstructed.

EC footer and on-disk format

Each particle carries a fixed EC footer (currently 94 bytes) at the tail of the object. The footer stores at least:

  • Logical object size
  • MD5 and SHA‑256 of the logical object
  • Logical modification time
  • Compression type
  • Shard index and layout parameters

This means size, ModTime, and hashes can be obtained with a single range read against one particle, without streaming the full object, which aligns with the RFC EC footer proposal. Because raid3 always rewrites data on Put/Update anyway, adding this footer does not add extra object updates; the footer becomes part of the normal write path.

Spooling and write path

  • Spooling default: use_spooling = true (default). Particles are written to local temp files first and then uploaded with a known size.
  • Preflight in parallel: preflight checks and spooling run concurrently so preflight latency overlaps with local writes.

This improves behavior with SFTP, MinIO, and any backend that prefers or requires a known size, at the cost of ~1× object size in local temp space.

Compression

raid3 has optional, block-based compression (128 KiB blocks). Compression is end-to-end inside the EC scheme and recorded in the footer, so reads can transparently decompress without extra side metadata.

  • compression = none (default)
  • compression = snappy (fast)
  • compression = zstd (better ratio)

Both full and range reads work with compression enabled, including in degraded mode.

Block-level range reads

Partial reads (e.g. HTTP range, media streaming) are served at block granularity.

  • Only the required blocks are fetched from the underlying particles, not the full particle objects.
  • Works for compressed and uncompressed objects.
  • Works in healthy and degraded mode.

This keeps range-read performance acceptable even when one backend is down or slow.

Degraded mode, heal, rebuild

raid3 is RAID‑3 compliant in the sense of a 2+1 scheme:

  • Reads: succeed with any 2 of 3 remotes; missing particles are reconstructed on the fly.
  • Writes / Deletes: require all 3 remotes.

Background maintenance:

  • Auto-heal: queues reconstruction and background upload of missing particles when degraded objects are detected.
  • Manual repair:
    • rclone backend heal raid3: heal degraded objects (2/3 particles).
    • rclone backend rebuild raid3: rebuild missing particles after replacing or re-pointing a backend.

Rollback semantics

Put, Move, and Update implement rollback: on failure, partially written particles are cleaned up so the logical object either stays in its previous consistent version or disappears entirely. Any lost particle can then be restored via heal/rebuild using the remaining copies.

Backend commands

  • rclone backend status raid3: health status and rebuild guidance
  • rclone backend rebuild raid3: rebuild after backend replacement
  • rclone backend heal raid3: heal degraded objects (2/3 particles)

Configuration example

text

[raid3]
type         = raid3
even         = s3:bucket1/data
odd          = gdrive:backup/data
parity       = dropbox:parity
compression  = none
use_spooling = true

The backend is available in the fork Breschling/rclone on the add-raid3-backend branch. Feedback, test configurations, and failure-mode reports are welcome.

2 Likes