RFC: File footer proposal for storing shards in erasure-code virtual remotes

Cloud storage providers use erasure coding to achieve very high durability. Best practice is to use dual-region or multi-region locations.

We are working on a first erasure-coding implementation for rclone, spreading shards across different regions AND different storage providers. We see the raid3 virtual remote as the first member of a broader family of erasure-coding virtual remotes. Reed–Solomon should follow as the next, being the de facto standard.

Erasure coding changes the stored form of data, so attributes like size and hashes cannot be derived correctly from the per-shard metadata of the underlying remotes alone. We therefore need to append this information to each stored shard. That gives:

  1. Hashes and size from a range read on the shards, without downloading the full object.
  2. Self-contained shards: shards that belong to the same logical object can be identified even when the original rclone config is not available.

We propose storing this metadata in a fixed-size footer at the end of each shard. The format is intended to work for a family of algorithms (e.g. raid3 and future Reed–Solomon), not only for raid3.

Proposed 86-byte footer

Offset Size Field Type / encoding
0 9 Magic "RCLONE/EC" (9 bytes)
9 2 Version uint16 (e.g. 1)
11 8 Content length int64 (logical file length)
19 16 MD5 16 bytes raw
35 32 SHA-256 32 bytes raw
67 8 Mtime int64 Unix seconds
75 4 Algorithm ASCII, null-padded (e.g. "R3\0\0", "RS\0\0")
79 1 Data shards uint8
80 1 Parity shards uint8
81 1 Current shard uint8 (0-based index)
82 4 Reserved 4 bytes (zeros if unused)
86 (end of footer)

Example:
rclone hashsum MD5 ec-based-remote:path could satisfy the hash by doing a single range read on the last 86 bytes of one shard to get MD5 (and SHA-256), instead of streaming the full object.

We use a fixed-size footer at the tail of each shard so that (1) one range read is enough to read the whole footer, and (2) the format stays simple and predictable. The footer must be at the end of the shard because content length and hashes are only known after the full logical object has been processed.

Please share your thoughts, questions, or suggestions in the replies.


Resources

Cloud storage and erasure coding

Resiliency, durability, and availability (Backblaze B2)

Current EC-based virtual backend

1 Like