Hello,
I've been a fan of swiftsync rclone for a lot of years, and I'm also a fan of Kopia.
Kopia is a CLI backup tool (also written in Go) that performs block-level deduplication and compression on the contents. Data is managed as snapshots, which is quite a different architecture compared to rclone.
I've always wanted an rclone remote that can deduplicate data to maximize my storage usage.
Recently I took inspiration from how kopia splits contents to make my own rclone remote called "dedup."
The dedup backend is a new rclone overlay backend that wraps another remote and provides block-level deduplication and compression. It uses content-defined chunking (rolling hash) to split files into variable-size chunks, hashes each chunk with keyed BLAKE2b-128, compresses with zstd, and stores only unique chunks on the underlying remote. A JSON manifest file per logical file records the chunk list needed to reconstruct it.
This is similar to Kopia's content-defined chunking, but without snapshot management โ it's a transparent overlay. Files appear normally, and deduplication happens automatically behind the scenes.
Results
Original Data Set
Total size: 917.384 MiB
Dedup remote (no compression)
Total size: 690.422 MiB
Dedup remote + zstd 22
Total size: 426.619 MiB
For comparison, this is the current rclone compress remote with maximum zstd compression:
Total size: 671.503 MiB
I did wrap a compress remote with my dedup remote as a test but encountered very many file copy errors for some reason so I don't have real results for that test.
I rclone copy'd the data back to a new directory and ran a diff on the original directory against the newly downloaded one and they are identical.
I have admittedly done very little testing and consider it to be highly experimental / proof-of-concept. I'm unsure about data-loss risk with the dedup remote, but I thought it was a cool result and wanted to share.
Kind regards,
Matt
EDIT: Here's a screenshot of the same results as above.
EDIT2: Here's a (boring) video of me interacting with the dedup remote a little.
https://asciinema.org/a/tv01yDe8ppJVBXBo
