Distributed Block Storage
for Kubernetes

Replicated and erasure-coded persistent volumes with topology-aware placement, self-healing, and persistent brick-local storage. Built in Rust for performance and reliability.

Get Started Product To-Do GitHub

Features

Flexible Protection Policies

Choose replicated or erasure-coded volumes. The policy is stored with the volume and exposed through the admin API, CLI, and operator-facing docs.

Topology-Aware Placement

TopoHash algorithm spreads data across failure domains — datacenter, rack, and host — so a single rack failure never exceeds fault tolerance.

Self-Healing

Automatic brick failure detection via heartbeats, shard rebuild planning, and volume recovery. Degraded volumes are restored without operator intervention.

Operational Controls

Alert on degraded bricks and volumes, inspect placement dependencies, deterministic drain with migration tracking, plan and apply rebalancing after topology changes, and guard removals with impact checks.

Kubernetes Native

CSI driver, Helm chart, CRDs for clusters, bricks, and volumes with active operator reconciliation, and node-local NVMe/TCP export reconciliation. Provision volumes declaratively or with StorageClasses.

Persistent Brick Storage

Bricks keep a stable local identity and store shard data in a persistent device-backed log, so restarts preserve both on-disk data and brick UUIDs.

Architecture

A layered system from Kubernetes workloads down to brick-local storage, with replicated or erasure-coded protection, topology-aware placement, and operator-visible recovery workflows.

Metadata Service

Cluster brain — manages volumes, placement maps, health monitoring, and rebuild orchestration. Port 9200.

Brick Servers

Chunk storage with heartbeat and auto-registration. Each brick manages local NVMe/SSD storage. Port 9100.

CSI Driver

Kubernetes CSI integration for PV/PVC provisioning. Creates volumes via metadata, stages and publishes on nodes. Port 9300.

Export Runtime

`hyperblock-nbd` provides the current compatibility bridge, while `hyperblock-nvmf` and the reference SPDK target-manager path materialize node-local NVMe/TCP exports for future native serving flows.

Operator

Watches HyperblockCluster, HyperblockBrick, and HyperblockVolume CRDs. Reconciles metadata StatefulSets, brick/CSI DaemonSets, brick registration, and volume provisioning.

CLI

Admin command-line interface for volume, brick, and cluster operations.

TopoHash

Hyperblock's topology-aware placement engine turns a volume ID, a stripe index, and a simple placement rule into a deterministic set of brick targets and a persisted placement map.

Input 1

Topology Tree

ClusterTopology is a real hierarchy: root -> datacenter -> rack -> host -> brick. Every placement decision starts from that tree.

Input 2

Placement Rule

PlacementRule is a list of level/count/mode steps such as "pick 6 distinct racks first, then fall back to any remaining unique bricks."

Input 3

Placement Key

compute_pg_id() hashes volume_id || stripe_index with xxh3, producing the deterministic key that drives domain and brick ranking.

Output

Placement Map

The metadata service stores an ordered set of brick targets in a PlacementMap, bumps the placement version, and copies that version onto the volume as placement_epoch.

Selection Flow

Candidate domains at rack level

rack-a

b1 b2

rack-bb3
b4

rack-cb5
b6

rack-db7
b8

TopoHash ranks all racks once, then ranks all leaf bricks inside each chosen rack. The first unique brick in each ranked domain wins that slot.

Resulting brick order

b3 b6 b7 b2 b5 b8

The first four picks satisfy rack spread. The last two come from the cluster-wide unique-brick fallback because only four racks exist but six targets are required.

Block To Placement Mapping

volume offset

chunk index

stripe index

pg_id

PlacementMap

0-127 MiB 128-255 MiB 256-383 MiB 384-511 MiB

Logical block I/O

split_io() converts a byte range into one or more ChunkId { volume_id, index } operations using the volume's logical chunk_size.

Placement key

The metadata layer hashes volume_id plus the chunk or stripe index to get the placement-group key that TopoHash evaluates.

Replicated volume

One logical chunk location stores multiple brick IDs.

ChunkLocation {
  chunk_id: 3,
  brick_ids: [b3, b6, b7]
}

Erasure-coded volume

One stripe expands into many shard locations, each with one brick target.

stripe 3
  shard0 -> b3
  shard1 -> b6
  shard2 -> b7
  shard3 -> b2
  parity0 -> b5
  parity1 -> b8

Build or update topology

Bricks are inserted under datacenter, rack, and host nodes. The tree stores weights and the physical path to every brick.

Hash the placement key

compute_pg_id() creates a 64-bit placement key from volume_id and stripe_index. Same inputs always produce the same key.

Rank candidate domains

For each rule step, TopoHash collects every node at the requested level and deterministically ranks them from the placement key.

Rank bricks within each domain

Leaf bricks inside the chosen domain are ranked in deterministic order. The first brick not already used wins that slot.

Fallback if the ideal spread is impossible

If the rule cannot produce enough distinct domains, TopoHash ranks the entire brick set and fills the remaining slots with any unique bricks.

Persist the result

The metadata service writes a PlacementMap, increments the placement version, and clients consume that map for later reads, writes, drains, and rebalance plans.

Client write semantics

The client resolves the volume, reads or caches its PlacementMap, and splits a block I/O into one or more chunk operations.
For replicated volumes, the client looks up the matching ChunkGroup, takes the first logical ChunkLocation, and fans the same chunk out to every brick in that location's brick_ids.
The replicated write succeeds when at least write_quorum replicas acknowledge the chunk. Failed bricks are marked failed locally so later operations prefer healthier targets.
For erasure-coded volumes, the client encodes the chunk into data and parity shards, then writes one shard to each placement-map location for the stripe.
The current EC path expects all shard writes for the stripe to complete; otherwise the write fails and the operator-visible recovery path takes over.

Client read semantics

The client uses the same chunk or stripe index to find the correct ChunkGroup in the cached placement map.
For replicated volumes with read_quorum = 1, the client tries replicas in order, skips locally failed bricks, and returns the first healthy response.
For replicated volumes with read_quorum > 1, the client reads from all healthy replicas in parallel and requires at least read_quorum byte-identical responses.
If quorum is reached but some replicas disagree, the client uses the majority response and triggers a background rewrite to repair divergent replicas.
For erasure-coded volumes, the client reads shard locations, reconstructs the stripe locally if enough shards survive, then truncates the decoded bytes back to the requested length.

What the metadata service stores

PlacementMap {
  volume_id,
  version,
  groups: [
    ChunkGroup {
      stripe_index,
      locations: [
        ChunkLocation { chunk_id, brick_ids[...] }
      ]
    }
  ]
}

Current implementation boundaries

Selection is topology-aware and deterministic, but the current code uses simple ranked selection rather than a more elaborate weighted bucket algorithm.
Topology weights are stored and rolled up through the tree, but current selection does not yet bias picks by weight.
The metadata service computes and persists placement maps today; clients and gateways consume those maps rather than independently evaluating placement from raw topology state.
Placement operates on block-storage chunks and stripes, not on a separate object namespace.

How It Works

Write Path

Client Write

Application writes data to a volume via the CSI-mounted block device.

Protect The Data

The client either fans the chunk out to replicas or encodes it into data and parity shards, depending on the stored protection policy.

Placement Lookup

TopoHash maps the chunk index to target bricks across failure domains.

Parallel Fanout

All shards are written to their assigned brick servers in parallel via gRPC.

Read Path

Placement Lookup

Client resolves which bricks hold the shards for the requested chunk.

Parallel Fetch

Shards are fetched from brick servers in parallel, skipping any failed bricks.

Replica Failover Or EC Reconstruct

Replicated volumes fail over to another healthy replica. Erasure-coded volumes reconstruct the stripe locally when enough shards survive.

Return Data

Data shards are concatenated and truncated to the original length.

Self-Healing

Heartbeat Monitor

Bricks send heartbeats every 10s. The health monitor scans every 15s for stale bricks (30s timeout).

Failure Detection

Missed heartbeat marks brick as Down. Affected stripes and volumes are identified.

Rebuild Planning

Replacement bricks are selected from the healthy pool. The RebuildPlanner creates migration tasks.

Recovery

Shards are rebuilt or migrated to new bricks, and operators can inspect placement movement through the CLI before removing infrastructure.

Erasure Coding Profiles

Reed-Solomon erasure coding provides durability without full replication. Choose a profile that matches your fault tolerance and storage efficiency requirements.

Profile	Data Chunks	Parity Chunks	Total Shards	Fault Tolerance	Storage Overhead
`EC_4_2`	4	2	6	2 brick failures	1.5x
`EC_8_3`	8	3	11	3 brick failures	1.375x
`EC_8_4`	8	4	12	4 brick failures	1.5x

Default chunk size: 128 MiB. Compare with 3x replication overhead for equivalent fault tolerance.

Tech Stack

Rust

1.75+ MSRV

Tokio

Async runtime

Tonic + Prost

gRPC framework

OpenRaft

Consensus

Sled

Embedded KV store

Reed-Solomon

Erasure coding

xxHash (xxh3)

Fast hashing

kube-rs

Kubernetes client

Prometheus

Metrics

Clap

CLI framework

Docker

Multi-stage builds

Helm

K8s deployment

Quick Start

1. Build from Source

# Prerequisites: Rust 1.75+, protobuf-compiler
cargo build --workspace
cargo test --workspace       # 180+ tests
cargo clippy --workspace --all-targets   # Must be warning-free

2. Deploy to Kubernetes

# Install CRDs
kubectl apply -f deploy/helm/hyperblock/crds/

# Install with Helm
helm install hyperblock deploy/helm/hyperblock \
  --namespace hyperblock-system \
  --create-namespace \
  --set image.repository=myregistry/hyperblock \
  --set image.tag=0.1.0

3. Create a Volume

# CLI examples
hyperblock-cli volume create --name analytics-ec --size 100GiB --data-chunks 4 --parity-chunks 2
hyperblock-cli volume create --name postgres-r3 --size 50GiB --replicas 3

# Or use a PVC through CSI
kubectl apply -f pvc.yaml

Distributed Block Storagefor Kubernetes

Features

Flexible Protection Policies

Topology-Aware Placement

Self-Healing

Operational Controls

Kubernetes Native

Persistent Brick Storage

Architecture

Metadata Service

Brick Servers

CSI Driver

Export Runtime

Operator

CLI

TopoHash

Topology Tree

Placement Rule

Placement Key

Placement Map

Selection Flow

Block To Placement Mapping

Replicated volume

Erasure-coded volume

Build or update topology

Hash the placement key

Rank candidate domains

Rank bricks within each domain

Fallback if the ideal spread is impossible

Persist the result

Client write semantics

Client read semantics

What the metadata service stores

Current implementation boundaries

How It Works

Write Path

Client Write

Protect The Data

Placement Lookup

Parallel Fanout

Read Path

Placement Lookup

Parallel Fetch

Replica Failover Or EC Reconstruct

Return Data

Self-Healing

Heartbeat Monitor

Failure Detection

Rebuild Planning

Recovery

Erasure Coding Profiles

Tech Stack

Quick Start

1. Build from Source

2. Deploy to Kubernetes

3. Create a Volume

Distributed Block Storage
for Kubernetes