5 min read

Fast State Commitments: A Survey of the Landscape Before AlDBaran

Written by

Eclipse Labs

Published on

July 24, 2025

A few weeks ago, we introduced AlDBaran, the state commitment engine for Eclipse that sustains over 48 million state updates per second on a 96-core machine. By decoupling the in-memory hot path (Pleiades) from the asynchronous historical proof store (Hyades), AlDBaran removes the state commitment bottleneck that has long constrained high‑throughput blockchains.

But this leap forward was not made in a vacuum. It was the result of a comprehensive investigation into the existing landscape of Authenticated Data Structures (ADS). To achieve our GigaCompute goal of 1M TPS and above, the Eclipse client requires a state engine capable of handling at least 3 million state updates per second, based on how much state is typically touched by a transaction. Before building our own solution, we compared and benchmarked some of the potential authenticated database engines to see if any could meet this demanding target.

This blog provides a deep-dive into that related work. To fully appreciate why AlDBaran’s design is a necessity for Eclipse, it is crucial to understand the architectures and inherent limitations of the systems that came before it. We will survey five leading designs, from layered trie approaches to unified, high-throughput architectures, and show why, despite their innovations, a new solution was required to unlock the next level of blockchain performance of Eclipse.

The Challenge: Beyond Layered Architectures

Blockchain state management traditionally combines a proof layer (an ADS) with a storage layer (a key-value store). The classic implementation, seen in Ethereum with its Merkle Patricia Trie (MPT) on top of LevelDB or RocksDB, suffers from significant I/O overhead. This compaction‑based model introduces significant write amplification, often 5x–10x more physical writes per logical update, and read amplification, since each operation must touch multiple LSM levels and trie nodes. Each state update can cause a cascade of disk operations, leading to an O((logN)2) I/O cost that makes the execution layer I/O-bound.

Emerging blockchain architectures demand ADS engines that rethink the separation between “proof” and “storage.” The ideal design minimizes write amplification by pruning outdated data in place or by batching updates. It targets O(1) or low-constant I/O complexity to avoid I/O‑bound storage layers. Memory footprint should remain reasonable, trading off in‑memory hashing against on‑disk commitments where appropriate. Proof capabilities should extend beyond simple inclusion proofs to support range proofs, change proofs, and historical queries for archival or analytic use cases. Finally, production-readiness, based on code maturing, audit history, and ecosystem integration, determines how readily an existing design can be adopted in live networks.

Addressing these challenges, a new generation of integrated ADS engines has emerged. Each engine offers a unique approach to overcoming these bottlenecks.

Avalanche MerkleDB

Overview

Of the modern authenticated databases, Avalanche's MerkleDB is the most "traditional", as it retains the classic design of layering a Merkle trie on top of a generic database like RocksDB, representing a highly optimized version of the layered approach. It has served as the backbone of the Avalanche network, proving itself capable of handling high-throughput workloads and sub-second finality.

MerkleDB offers a more conservative evolution, layering a Merkle Radix Trie atop RocksDB. It leverages copy‑on‑write “Views” for provisional state mutations and batches trie commits to reduce round‑trip I/Os. Compression algorithms like Snappy and Zstandard mitigate storage growth, while RocksDB’s mature compaction tuning enables sustained sub‑second finality.

MerkleDB reliably supports several thousand TPS in production and was audited by OpenZeppelin in March 2023, signaling high confidence in its stability. However, because it preserves the underlying LSM design, MerkleDB retains a non‑trivial write amplification factor and background compaction overhead, making it less I/O‑efficient than fully integrated engines.

Architecture and Data Model

MerkleDB implements a Merkle Radix Trie layered on top of a standard key-value store like RocksDB. It is written in Go and is tightly integrated into the AvalancheGo client. In this model, each node of the trie is stored as a distinct entry in the database, keyed by the node's hash. To improve efficiency, MerkleDB separates value-bearing leaf nodes from intermediate branch nodes by using different key prefixes within RocksDB, which simplifies iterating over the actual state data.

Proof Scheme

MerkleDB is built on a standard radix‑16 Merkle Trie (akin to Ethereum’s Patricia Trie). Keys are split into 4‑bit nibbles, so each lookup or update walks at most ⌈key_length/4⌉ levels, and proofs consist of a single sibling‑hash per level. This delivers O(d)-sized (where d is trie depth) inclusion and non‑inclusion proofs, minimal proof‑overhead, and constant‑time hash verifications, ensuring both compactness and cryptographic soundness.

A key architectural feature of MerkleDB is its use of copy-on-write "Views." A View is a lightweight, in-memory snapshot of the trie where modifications can be applied speculatively. This is crucial for block execution, as it allows the system to process transactions and even stage changes for pending blocks without immediately mutating the persistent, on-disk state.

Only when a block is finalized are the changes from its corresponding View committed and merged into the base trie in a single, batched operation.

Performance and Inherited Trade-offs

The View mechanism allows for a degree of optimistic concurrency, but commits are serialized through a commitLock to ensure atomicity, effectively rendering the storage layer into a single-writer high-contention system during block application.

While MerkleDB is battle-tested and production-ready, having been audited by OpenZeppelin in March 2023, it cannot escape the fundamental trade-offs of its layered design.

In summary, MerkleDB’s use of a radix‑16 Merkle Trie guarantees compact, depth‑bounded proofs and O(d) proof generation, offering stronger verifiability than schemes with larger or variable‑sized proof structures. Its straightforward Merkle design delivers both efficiency and robust cryptographic guarantees.

MerkleDB represents a mature, reliable solution, but ultimately it is an inefficient solution. It pushed the layered model to its practical limits but also highlighted its inherent inefficiencies, paving the way for its successor, Firewood.

Avalanche Firewood

Overview

Developed by Ava Labs as the next-generation storage engine for the Avalanche ecosystem, Firewood represents a radical departure from the layered design of MerkleDB. It is a purpose-built, integrated authenticated database that seeks to eliminate the overhead of the key-value abstraction entirely.

Firewood reimagines the state trie as the on-disk index itself, arranging nodes in a B⁺‑tree–inspired, compaction‑less layout. Outdated trie revisions are pruned in place using a “Future‑Delete Log,” avoiding the need for background compactions or layered KV abstractions. By treating the trie as the primary data structure, Firewood slashes write amplification and reduces random I/O. While proof-system integration is on the roadmap, Firewood’s primary advantage today is its direct alignment of trie I/O with SSD semantics, eliminating the inefficiencies of LSM compactions.

Eliminating the Layers

Firewood's core innovation is simple but profound: the on-disk index is the Merkle trie itself. Instead of flattening the trie structure into key-value pairs, Firewood stores trie nodes natively on disk in a B⁺-tree-like layout. This eliminates the expensive serialization and abstraction layers, aligning the physical storage directly with the logical data structure. It is a compaction-less engine written in Rust, designed from the ground up to minimize I/O and maximize throughput.

Architecture and Storage Management

Firewood performs in-place updates on the active state and prunes outdated data on the fly, avoiding the write amplification that plagues LSM-based systems like RocksDB. When a block is committed, new trie nodes are written to disk, and any nodes that become obsolete are recorded in a Future-Delete Log (FDL). When a state version (or "revision") expires and falls outside the configured retention window, Firewood processes the FDL to reclaim the space occupied by those stale nodes. This clever mechanism avoids the need for a separate, resource-intensive pruning or compaction process.

The design is highly optimized for modern SSDs. By co-locating nodes that are frequently accessed together (e.g., nodes along a path in the trie), Firewood improves cache locality and reduces disk seeks, a technique known as location-aware storage.

Performance and Goals

Firewood is engineered to meet the high‑throughput requirements of Avalanche and its subnets by storing the trie itself in a compaction‑less B⁺‑tree layout. Crash recovery is managed via a standard write‑ahead log (WAL), and by eschewing an LSM‑tree design, Firewood avoids the write stalls and throughput degradation typically caused by compactions. There were some performance comparisons against Geth, but we couldn’t find any TPS numbers from Ava labs on this.

State History, Licensing, and Status

Firewood maintains a partial state history through its "revision" system, retaining a configurable number of recent state versions while garbage-collecting older ones. This provides fast access to recent states for rollbacks or syncing without requiring unbounded disk growth.

Currently in an alpha developer preview, Firewood is still under active development. A significant consideration for the broader community is its licensing. The code is available under the "Ava Labs Ecosystem License v1.1," which restricts its use to the Avalanche public networks. This source-available but restrictive model is designed to give the Avalanche ecosystem a competitive advantage.

In summary, Firewood is a specialized state engine that trades generality and permissive licensing for a massive leap in single-node performance, showcasing the power of a truly integrated ADS design.

Nearly‑Optimal Merklization (NOMT)

Overview

Developed as a Rust‑based research prototype, NOMT pairs a binary Merkle Trie with a flat, page‑aligned key–value store to reduce random I/O on SSDs. By aligning nodes to flash page boundaries and avoiding pointer‑chasing within the trie, NOMT achieves approximately 43,000 state updates per second per thread, fully saturating modern NVMe throughput at multi‑gigabyte scales. While NOMT cannot eliminate the fundamental O(logN) trie height, it reduces constant factors dramatically, yielding an order‑of‑magnitude improvement over naive MPT implementations.

Nearly-Optimal Merklization (NOMT) is less a specific product and more a design philosophy that has deeply influenced the development of modern authenticated databases. Championed by researchers like Preston Evans and Polkadot's Robert Habermeier, NOMT's central thesis is that performance gains come from aligning the data structure's physical layout with the characteristics of the underlying hardware, particularly SSDs.

The Core Idea

NOMT proposes two fundamental shifts. First, it moves from a high-arity radix trie (like MerkleDB's radix-16) to a simple binary Merkle tree (arity 2). While binary trees are deeper, their proofs are smaller, requiring only one sibling hash per level instead of up to 15.

Second, and more importantly, NOMT decouples the logical trie structure from its physical on-disk layout. It achieves this by packing multiple binary trie nodes into a single, fixed-size disk page (e.g., 4KB), which is the native block size for SSDs. A single 4KB page can hold a complete binary sub-trie of depth 6, containing 64 nodes.

This was a significant step. A lookup or update in a traditional trie might require dozens of dependent, random disk reads to traverse the tree from root to leaf. With NOMT, that same operation might only require fetching one or two 4KB pages from disk, an order-of-magnitude reduction in random I/O.

Performance and Data Model

Because the location of the required pages can be calculated directly from the key's bits, the system can issue all necessary page reads in parallel, fully exploiting the capabilities of modern NVMe SSDs. Early benchmarks of a NOMT prototype demonstrated roughly 50,000 trie updates per second, a significant speedup over naive implementations.

To handle the variable-length keys common in blockchains, the Polkadot adaptation of NOMT uses a padding scheme to create uniform-length keys and introduces extension nodes to efficiently represent long, shared key prefixes without breaking the page-aligned layout.

Status and Influence

NOMT is primarily a set of research and development concepts rather than a deployed product. However, its ideas are highly influential and have been cited by teams working on other high-performance databases, including MegaETH. It represents a fundamental rethinking of how to persist authenticated data structures, prioritizing I/O efficiency above all else. By trading a small amount of space inefficiency (not all pages will be full) for a massive reduction in disk seeks, NOMT provides a blueprint for saturating modern storage hardware.

Layered Versioned Multipoint Trie (LVMT)

Overview

LVMT couples an append‑only Merkle tree with an Authenticated Multipoint Trie underpinned by algebraic vector commitments. Instead of hashing every branch upon each update, LVMT stores compact commitment data in the trie and defers the bulk of cryptographic work to vector‑commitment operations that run in amortized O(1) time. In Ethereum‑like benchmark scenarios, LVMT delivers up to 6x faster read/write performance and 2.7x higher overall transaction throughput compared to conventional MPT on LSM storage. Its primary trade‑off is the complexity of the underlying commitment scheme, which demands specialized libraries and, in some variants, a trusted setup.

While NOMT tackles the performance problem at the hardware layout level, the Layered Versioned Multipoint Trie (LVMT) approaches it from a cryptographic and algebraic angle. Presented in a 2023 OSDI paper, LVMT is a novel design that leverages more advanced cryptographic primitives to change the asymptotic complexity of state updates.

Vector Commitments

LVMT's architecture is built on a different type of cryptographic accumulator called a vector commitment. Specifically, it uses an Authenticated Multipoint Evaluation Tree (AMT) at its base layer. An AMT allows for committing to a large array of values in a way that allows for extremely efficient updates. Instead of re-hashing O(log N) nodes up to the root for every change, an AMT can often update its commitment in constant time using techniques like polynomial commitments.

Architecture

LVMT is not a single monolithic structure. It intelligently layers an append-only Merkle tree on top of these powerful AMT structures. The bottom layer consists of multiple AMTs, each managing a segment of the total key space. The upper layers of the trie do not store raw data or even full hashes of the layers below. Instead, they store much more compact data, such as simple version numbers.

When a value in a bottom-layer AMT is updated, the expensive cryptographic work is contained within that AMT. The change propagates up to the higher layers merely as an incremented version number, not a cascade of new hashes. This clever design avoids performing expensive elliptic curve operations or other heavy cryptography across the entire tree for every small update.

Asymptotic Breakthrough and Performance

This layered, versioned design leads to a profound performance breakthrough: amortized O(1) root commitments. The cost of updating the state commitment becomes nearly constant, regardless of the total state size, allowing LVMT to sidestep the O(log N) hashing bottleneck of traditional Merkle tries.

The practical results are impressive. In experiments running on Ethereum-like workloads, LVMT delivered up to 6x faster read/write operations and boosted overall transaction throughput by 2.7x compared to a conventional MPT on LSM storage. The design is also inherently versioned, making it naturally suited for historical state queries. A Rust implementation of the underlying AMT is being developed in the Conflux ecosystem, suggesting a path toward production use.

LVMT represents the frontier of authenticated data structures, showcasing how a shift in the underlying cryptographic tools can fundamentally alter the performance landscape.

Quick Merkle Database (QMDB)

Overview

QMDB unifies key–value storage and Merkleization in fixed‑size, append‑only “twig” structures, each batching 2,048 entries. By grouping updates into twigs, it achieves O(1) SSD I/O per state change and a single SSD read for lookups, while maintaining in‑memory Merkle hashing at just 2.3 bytes of RAM per entry. Benchmarks show sustained throughput of up to 2.28 million updates per second, supporting upwards of 1M TPS in ideal conditions, and practical scaling to over 15 billion entries. QMDB also provides built-in historical proofs, enabling past-state queries for auditing and analytics.

If the previous designs represent powerful new takes on specific components of the state problem, the Quick Merkle Database (QMDB) represents their synthesis, a hyper-optimized engine that unifies the key-value store, Merkle tree, and versioning system into a single, cohesive architecture. Developed by LayerZero Labs and open-sourced, QMDB sets a new standard for performance in authenticated databases.

Architecture

QMDB completely collapses the traditional two-layer stack into a single, integrated data structure. There is no separate KV store; keys, values, and all Merkle metadata are persisted together in one unified, append-only log structure. This eliminates redundant storage and the communication overhead between layers, which is a major source of its 6-8x performance gain over systems like RocksDB+MPT.

The central innovation in QMDB's design is the "twig". A twig is a fixed-size Merkle subtree that holds a batch of 2,048 leaf entries. All state updates are first buffered into a twig in memory. The Merkle root of this small twig is updated on the fly, entirely in RAM. Only when a twig is full is its content flushed to disk as a single, contiguous, append-only block.

Performance and Efficiency

This design has great performance implications:

Optimal I/O: State updates require only O(1) disk I/Os (a single sequential write per flushed twig), and lookups require at most a single SSD read. This is the theoretical minimum I/O complexity.
In-Memory Merkleization: All Merkle hashing is done in memory, meaning there is zero disk I/O overhead for calculating state roots during block execution.
Minimal Memory Footprint: This is achieved with an incredibly low memory footprint of just ~2.3 bytes of DRAM per entry for the Merkle state, thanks to a highly compressed representation of the twigs' metadata in memory.
Extreme Throughput & Scalability: In benchmarks, QMDB achieved up to 2.28 million updates per second and sustained 1 million TPS on a token transfer workload. It has been tested with over 15 billion entries, ten times the size of Ethereum's current state, on a single server.

Historical Proofs

QMDB is a fully versioned database from the ground up. Every entry contains pointers to its previous version, forming a linked list of changes over time. This allows QMDB to offer a powerful and unique feature, historical proofs. A user can query the database for the value of a key at any past block height and receive a valid Merkle proof against that historical state root, all while querying the most recent version of the database. This capability unlocks new application possibilities, such as on-chain verification of past events, without relying on trusted archival nodes.

QMDB stands as the current state-of-the-art, demonstrating what is possible when the data structure, storage layout, and cryptographic process are co-designed for maximum efficiency on modern hardware.

AlDBaran vs. QMDB

QMDB was long considered the state‑of‑the‑art ADS engine, with a claimed 2.28 M updates per sec; when we re‑ran their v0.2.0 code on the same AWS setup, we only saw 1.28 M updates per sec (and 0.803 M updates per sec on the June 26 2025 release). By contrast, AlDBaran’s Pleiades hot‑path consistently outperforms both versions by over 2x (see Part 1 for the full benchmark matrix), underscoring its superior scalability and reduced contention.

Even with a more frequent commit period, AlDBaran is over an order of magnitude faster than the highest throughput we could measure from the public QMDB code. This direct comparison solidified our conclusion: to meet the demands of a 1M TPS rollup, a new architecture was not just an option, but a necessity.

Comparative Analysis

Dimension	Firewood (Avalanche)	MerkleDB (Avalanche)	NOMT (Nearly-Optimal Merklization)	LVMT (Multi-Layer Versioned Trie)	QMDB (Quick Merkle Database)
Architecture & Data Model	Integrated on-disk Merkle Trie: stores trie nodes directly with a B⁺-tree-like layout	Layered Merkle Radix Trie on top of RocksDB	Binary Merkle Trie + separate flat KV store for values	Vector-Commitment Merkle Tree + Authenticated Multipoint Tree (AMT) overlay	Append-only “twigs” (fixed-size subtrees) unifying KV + Merkle storage
Storage Integration & I/O	Compaction-less, purely sequential writes; eliminates LSM write-amplification	RocksDB LSM with leveled compaction; each trie op → O(log N) I/Os	Flash-native layout; predictable 1–2 I/Os per lookup/update	Constant-time root update via vector commitments; still multi-I/O for proofs	O(1) writes (batched by 2 048 entries), 1 SSD read per lookup
Merkle Computation	On-disk trie writes and in-place hashing; WAL for atomic commits	Merkle nodes persisted as KV pairs; proof generation incurs disk reads	Standard Merkle updates with multiproof support	Merkle root updates in O(1) via vector commitments; proofs via AMT lookups	Fully in-memory Merkleization of twigs—no disk I/O for hashing
Versioning & Historical	Tracks only recent revisions (configurable N); older states pruned in-place	Copy-on-write Views + RocksDB snapshots; GC of old nodes	Single-state design; no built-in historical queries	Native support for exclusion/latest-value proofs via AMT, but not full history	Full historical proofs (inclusion/exclusion at any block) via OldId pointers
Peak Throughput	Targets high TPS with very low latency in alpha	Handles Avalanche’s high TPS; bounded by RocksDB compaction spikes	~43 k updates/sec per thread in PoC; SSD-bound benchmarks	Benchmarks show ~30 % uplift vs MPT in average case; constant-time roots	Up to 2.28 M updates/sec on NVMe RAID; 1 M TPS in token tests
Maturity & Production	Alpha/dev preview, planned mainnet integration, restrictive license	Production in AvalancheGo since late 2023; edited BSD-licensed	Proof-of-Concept; early Rust prototype (MIT/Apache)	Research prototype (OSDI ’23); full integration pending	Research prototype; lab benchmarks only to date, MIT/Apache license
Reusability	General-purpose Rust crate; could serve any Merkle-trie chain	Embedded in Go Avalanche nodes; reusable within Go ecosystem	Pluggable in Substrate/rollups; unopinionated on key format	Applicable to any blockchain needing fast proofs; vector commitments portable	Generic KV+ADS library in C++; twig concept portable to other contexts

‍

When viewed side‑by‑side, these ADS engines occupy distinct regions of the design space:

Firewood and QMDB emphasize minimal I/O amplification, Firewood via a compaction‑less trie layout, QMDB via O(1) twig batching,
NOMT and LVMT target algorithmic improvements to trie height and hash amortization
MerkleDB remains perhaps the most production‑ready, offering audit‑backed stability at moderate I/O cost.

In practice, rollup deployments sensitive to proof size and verification cost may prefer LVMT’s algebraic commitments, whereas high‑frequency execution layers with ample RAM may lean on QMDB’s twig architecture. Platforms seeking an incremental upgrade from Ethereum might adopt MerkleDB immediately, while research‑oriented teams could explore NOMT for flash‑native performance or Firewood for future-proof compaction‑less designs. Ultimately, though, Eclipse’s AlDBaran remains the standout choice, delivering unmatched throughput, seamless scalability, and full verifiability for teams that demand the very highest performance.

Conclusion

Our survey of Firewood, MerkleDB, Nearly‑Optimal Merklization (NOMT), LVMT, and QMDB reveals significant advances over legacy MPT+LSM stacks, yet each remains orders of magnitude short of the 3 million‑state‑update‑per‑second requirement underpinning the goals of GigaCompute for a 1M‑TPS+ rollup. Firewood’s compaction‑less trie, MerkleDB’s batched views, NOMT’s flash‑native layout, LVMT’s algebraic commitments, and QMDB’s twig architecture each improve I/O efficiency or proof flexibility, but none deliver sustained multi‑million updates per second in a production context.

This persistent performance gap is exactly what motivated the AlDBaran project. By decoupling in‑RAM hot‑path updates (Pleiades) from append‑only proof storage (Hyades) and leveraging SIMD batching, twig buffering, and deterministic layouts, AlDBaran consistently clears 24 million updates per sec with historization and over 48 million without as shown in the part 1 blog. In the AlDBaran part 2 blog, we showed how we achieved 48 million updates per sec by tuning snapshot frequency, subtree root counts, and memory prefetching.

As a result, Eclipse now meets, and vastly exceeds, its 1M‑TPS goal, turning state‑commitment from bottleneck to spare capacity.

Share this post