Cube

What is a Merkle Proof?

Learn what a Merkle proof is, how it works, why it enables compact verification, and how Bitcoin, Ethereum, Cosmos, and Substrate use it.

What is a Merkle Proof? hero image

Introduction

Merkle proof is the cryptographic technique that lets someone prove that a particular piece of data is included in a much larger collection while revealing only a tiny fraction of that collection. The reason this matters is practical, not just elegant: blockchains, databases, light clients, and proof systems constantly need a way to check one fact about a large state without downloading or trusting all of it. A Merkle proof is the compression trick that makes that possible.

The puzzle it solves is easy to state. Suppose a system has committed to a million records. Later, you want to check whether one record is really part of that committed set. Sending the full million records would work, but it destroys the point of having a compact commitment in the first place. Sending only the record itself is not enough, because anyone can invent a record. A Merkle proof sits in the middle: it is much smaller than the full dataset, but still strong enough to let an independent verifier recompute the same commitment.

That is the central idea to keep in mind throughout this topic: a Merkle proof is not proving that data is true; it is proving that data is consistent with a previously committed root hash. Whether the underlying data is meaningful, current, or honest depends on the larger system. The proof only answers a narrower and very valuable question: *does this leaf belong under this root? *

How does one Merkle root commit many leaves?

A Merkle proof makes sense only after you see what a Merkle tree is doing. Start with many pieces of data, often called leaves. Hash each leaf. Then pair adjacent hashes and hash each pair to form the next layer. Repeat this process until only one hash remains. That final hash is the root.

The root is useful because it acts as a compact commitment to all the leaves below it. If any leaf changes, its hash changes. That altered hash changes its parent, which changes the next parent, and so on all the way up to the root. So the root is sensitive to every leaf, even though it is only one digest long.

This sensitivity depends on the security properties of the underlying hash function.

NIST’s Secure Hash Standard describes secure hash functions as giving message-integrity and as computationally infeasible to invert or collide in the usual sense.

  • changing a message will
  • with very high probability
  • change its digest

Those properties are exactly what a Merkle construction leans on. If an attacker could cheaply find a different leaf arrangement that led to the same root, the commitment would lose its force.

Here is the basic compression point: the root commits to everything, but a proof only needs to show the hashes on one path from a leaf to that root. That is why proofs stay small even when the dataset is large.

Imagine four leaves: A, B, C, and D. First compute their leaf hashes: hA, hB, hC, and hD. Then combine pairs: hash hA || hB to get hAB, and hash hC || hD to get hCD, where || means concatenation. Finally hash hAB || hCD to get the root hRoot.

Now suppose you want to prove that C is included in the tree committed by hRoot. You do not need to reveal A and B and D in full. The verifier already has, or is willing to trust, the root hRoot. What they need is just enough information to rebuild the unique path from C upward.

So the proof provides the sibling hash next to C, which is hD, and then the sibling hash next to hCD, which is hAB. The verifier hashes C to get hC, combines it with hD in the correct order to reconstruct hCD, then combines hAB with hCD in the correct order to reconstruct hRoot. If the recomputed root matches the trusted root, the verifier accepts that C was included.

Notice what happened. The verifier never saw the full dataset. They only saw the target leaf plus the small number of sibling hashes needed to bridge that leaf to the root. In a balanced binary tree with n leaves, that path length grows roughly like log2(n), not n. This is the entire economic advantage of Merkle proofs: proof size grows slowly while the committed dataset can grow enormously.

What data and ordering must a Merkle proof include to verify a leaf?

ElementPurposeRepresentationIf missing
Sibling hashesRebuild branch hashesOrdered hash listCannot recompute root
Position / orderCorrect concatenation orderLeft/right flags or orderWrong hash result
Node encodingDomain separationPrefixed or typed bytesAmbiguous node meaning
Path structureLocate leaf in treeIndexed path elementsWrong sibling selection
Figure 34.1: Key fields a Merkle proof must contain

A common misunderstanding is that a Merkle proof is “just a list of hashes.” It is more precise to say it is a list of the hashes needed to reconstruct the path with the right structure. The verifier must know not only the sibling hashes but also how to combine them.

In a simple binary Merkle tree, that usually means the proof must preserve left-versus-right position at each level. If the verifier knows only the sibling hash but not whether it belongs on the left or right of the current node, they may hash in the wrong order and get a different result. Ordering is not a cosmetic detail. Since hash functions are sensitive to input order, Hash(x || y) and Hash(y || x) are generally different.

There is also an important design choice around how leaves and internal nodes are encoded before hashing. If a system is careless about domain separation (for example, if it does not distinguish a leaf hash from an internal-node hash) then ambiguity can appear. Good constructions prevent this by defining clearly what gets hashed at each stage. The exact encoding is a protocol rule, not an afterthought. Two implementations using different encodings can produce different roots from the same apparent data.

This is where neighboring concepts become relevant. A Merkle proof depends on a cryptographic hash function, but it also depends on a canonical tree-building rule. The hash gives integrity; the tree rules give unambiguous structure.

How do you verify a transaction using a Merkle proof (step‑by‑step example)?

Suppose a block contains many transactions, and a block header stores the Merkle root of those transactions. You are a lightweight client that does not want to download the whole block. You only care whether transaction T was actually included.

A full node sends you T, the Merkle proof for T, and the block header containing the trusted root. You first hash the serialized transaction exactly the way the protocol specifies. That gives you the leaf hash for T. The proof then gives you the sibling hash at the first tree level, so you combine the two in the specified order and hash again. That produces the parent hash. The proof gives you the next sibling hash one level up, and you repeat the same operation. After enough repetitions, you arrive at a candidate root.

If that candidate root is byte-for-byte equal to the Merkle root committed in the block header, you have learned something specific and limited: this transaction is part of the transaction set committed by that header. You have not proven that the transaction is valid in every broader sense, nor that the chain itself is the best chain, nor that no reorganization will happen. Those questions belong to consensus and chain-selection logic. The Merkle proof has done its own job cleanly: it linked one transaction to one commitment.

This separation of responsibilities is why Merkle proofs appear so often in light-client designs. They let the client avoid full data download while still checking inclusion against a compact on-chain commitment.

Why are Merkle proofs compact and cheap to verify?

The efficiency gain comes from tree shape. If a tree has 2^k leaves, a proof path has k sibling hashes. So doubling the number of leaves adds only about one more hash to the proof. That logarithmic growth is why Merkle proofs scale well.

The verifier’s work is also small. They perform one leaf hash and then one hash per level of the path. Compared with downloading or recomputing the whole dataset, that is tiny. This makes Merkle proofs especially attractive for devices with limited bandwidth or storage. EIP-1186 in Ethereum was motivated in part by this exact need: constrained clients can request account and storage proofs and verify them offline against a trusted block header.

But the proof is only compact because the system has already paid the cost of building or maintaining the commitment structure. A full node, database, or prover usually stores the tree or enough intermediate information to answer inclusion queries efficiently. So the mechanism is not “free compression.” It shifts cost from every verifier to the party maintaining the committed structure.

How do inclusion and non‑inclusion proofs differ across data structures?

StructureSupports inclusionNon-inclusion methodProof complexityTypical use
Binary Merkle treeYesNo direct absence proofO(log n) hashesTransaction inclusion
Ordered key-value treeYesShow neighbor keys or gapsO(log n) nodesKV existence / absence proofs
Patricia trieYesPath mismatch and node dataPath-length dependentAccount/storage state proofs
Merkle MMR / range proofsYesRange endpoints and digestsLogarithmic to range sizeBatch or historical proofs
Figure 34.2: How inclusion and non-inclusion proofs differ across structures

In the simplest binary Merkle tree, inclusion proofs are the clearest case. Proving non-inclusion is more subtle, because absence is not a leaf you can point to. Different systems solve this in different ways.

Ordered key-value trees can prove non-existence by showing the neighboring committed structure around the missing key. ICS23, the proof format used in the Cosmos and IBC ecosystem, explicitly supports both proofs of existence and proofs of non-existence. That works because the underlying stores are structured and ordered in a way that makes absence meaningful and checkable.

Ethereum shows why the phrase “Merkle proof” often hides structural variation. Ethereum’s state is not stored in a simple binary Merkle tree but in a modified Merkle Patricia trie. The block header commits roots such as stateRoot, transactionsRoot, and receiptsRoot, all based on trie roots rather than plain binary trees. A proof there is not just a binary list of sibling hashes. It is a sequence of RLP-serialized trie nodes along a key path derived from Keccak-256 of the address or storage key. EIP-1186 standardizes an eth_getProof RPC method that returns accountProof and storageProof in exactly this form.

The important continuity is conceptual, not structural: in each case, a small proof lets a verifier recompute a committed root from a local target value plus a narrow slice of authenticated structure. The exact node format changes. The commitment logic does not.

How do Merkle proofs differ between Bitcoin, Ethereum, Cosmos, and Substrate?

ChainStructureProof formatHow to requestMain caveat
BitcoinBinary Merkle treeMerkle branch (sibling hashes)gettxoutproof RPCHeader must be trusted
EthereumModified Merkle Patricia trieRLP-serialized trie nodesethgetProof RPCTrie-specific path parsing
CosmosOrdered key-value Merkle treeICS23 / ProofOps binary opsABCI query with Prove=trueRequires lexicographic ordering
SubstrateTrie or Merkle-backed storageReadProof / RPC bytesstategetReadProof RPCNode config affects availability
Figure 34.3: How Bitcoin, Ethereum, Cosmos, and Substrate proofs differ

Bitcoin is the cleanest place to picture the classic form. A block header commits to a transaction Merkle root, and a light client can verify a transaction’s inclusion using that root plus a branch. Bitcoin Core’s gettxoutproof RPC returns a serialized proof that a transaction was included in a block. This is the canonical “membership in a committed set” use case.

Ethereum uses the same commitment principle for richer state, but because account and storage state are key-value mappings rather than a simple list, the underlying structure is a trie. The proof follows path fragments and encoded nodes rather than only left-right sibling hashes. The verifier still starts from a trusted root in the block header and checks whether the path and node hashes reconstruct it.

Cosmos systems often expose proofs through ABCI queries when Prove = true, and the returned proof material can be represented with proof operations designed for external verification. ICS23 exists precisely because ecosystems that exchange proofs across languages and chains need a standardized binary representation, not just an internal tree implementation.

Substrate-based chains expose storage read proofs through RPC methods such as state_getReadProof. Again, the details of encoding and storage layout are chain-specific, but the pattern is familiar: the chain header commits state, and the node can return a compact proof that selected storage entries match that committed state at a particular block.

These examples matter because they show what is fundamental and what is conventional. Fundamental: a root commits to a large structure, and a proof reveals enough local structure to rebuild that root for one query. Conventional: whether the structure is binary, Patricia, radix-based, lexicographically ordered, or serialized in a particular wire format.

What must a verifier trust when checking a Merkle proof?

A Merkle proof is only as useful as the root it is checked against. If the verifier accepts an attacker-chosen root, the attacker can prove almost anything relative to that fake commitment. So the real trust anchor is not the proof itself but the source of the root.

In blockchains, that source is usually a block header already accepted through consensus or light-client logic. In a proof-of-reserves system, it may be a publicly announced root signed by an exchange. In a database replication protocol, it may be a root already authenticated through another secure channel.

This is the most common conceptual mistake: people sometimes speak as if a Merkle proof by itself creates trust. It does not. A Merkle proof creates conditional verifiability: if you trust this root, then you can independently verify whether this leaf belongs under it.

That conditional structure is why Merkle proofs pair naturally with light clients. The client spends effort deciding which headers or roots are authoritative, then uses Merkle proofs to answer many specific state questions cheaply.

What security assumptions do Merkle proofs rely on, and how can they fail?

The security of a Merkle proof rests on several layers, and each layer has its own failure mode.

The first layer is the hash function. If collisions or second-preimage attacks became practical against the chosen hash in the relevant setting, the binding property of the root would weaken. NIST describes secure hash algorithms as making it computationally infeasible to find a message for a given digest or two distinct messages with the same digest. Merkle commitments inherit their credibility from those properties.

The second layer is canonical encoding. If different implementations serialize leaves or internal nodes differently, they may disagree about the root. This is not a cryptanalytic break; it is an interoperability break. In practice, many proof bugs come from format mismatches, endianness mistakes, omitted prefixes, or inconsistent path interpretation.

The third layer is authenticated root distribution. A perfectly valid proof against the wrong root is still misleading. In blockchain settings, this means proof verification is inseparable from header verification. SPV-style systems make this explicit: the Merkle branch proves transaction inclusion in a block, while the consensus logic decides whether that block belongs to the authoritative chain.

The fourth layer is data availability and historical retention. A root may commit to data that is not easy to retrieve later. Ethereum’s specification notes that implementations may prune older trie nodes, which means historical proofs may depend on archive data or checkpoints being retained. Avalanche’s MerkleDB documentation makes a related point for change proofs: some proofs require retained historical state and cannot be verified standalone without it. So a commitment can stay valid even when generating or checking certain proofs becomes operationally harder.

Why is a single universal Merkle‑proof format difficult to create?

It is tempting to think there should be one universal Merkle proof format. In practice, that is difficult because different authenticated data structures expose different invariants.

ICS23 is an instructive example. It aims to define a generic, cross-language binary representation of Merkle proofs for systems such as IBC. But it also explicitly notes a limitation: tries like Ethereum’s are not naturally supported because the key is not present in the leaf in the way ICS23 expects, and verification would require much more database-specific path logic. That is not a failure of the idea. It is a reminder that “Merkle proof” names a family resemblance, not always a byte-for-byte standard.

The deeper reason is that proofs are really explanations of why this root follows from this query under this exact commitment scheme. Change the scheme, and the explanation format often has to change too.

How do Merkle proofs enable light clients and SPV?

Merkle proofs become especially important when verification and storage are separated. A full node stores large amounts of state and can answer queries. A light client stores much less and asks for proofs. The proof closes the trust gap.

In Bitcoin-style SPV, the client downloads block headers and receives transaction inclusion proofs relative to transaction Merkle roots. In Ethereum, a light or external verifier can use eth_getProof to request account or storage proofs and check them against a trusted stateRoot. In Cosmos and Substrate ecosystems, query interfaces expose proof-carrying responses so remote state reads can be independently checked.

This pattern scales beyond current mainstream light clients. Research such as Flyclient uses Merkle Mountain Range commitments and probabilistic sampling to make chain verification itself more succinct than ordinary SPV. The common thread is still commitment plus compact proof. What changes is what exactly is being committed; transactions in one block, a global state trie, or even a compressed representation of chain history.

What can’t a Merkle proof prove?

It helps to end by being strict about scope. A Merkle proof does not prove freshness unless tied to a recent trusted root. It does not prove semantic validity beyond membership in the committed structure. It does not replace consensus, signatures, or data-availability assumptions. And it does not remove the need to understand the exact encoding rules of the structure being proven against.

What it gives you is narrower and more powerful than it first appears: a way to verify one local claim against one global commitment with very little data.

Conclusion

A Merkle proof works because a hash tree turns a large dataset into one root, and a short path of sibling information is enough to reconnect a chosen leaf back to that root. The proof is small because it reveals only the hashes along one branch, not the whole structure. If you remember one thing tomorrow, remember this: a Merkle proof is a compact argument that “this piece belongs to that commitment”; and that simple idea is one of the main reasons modern cryptographic systems can be both verifiable and scalable.

What should I understand about Merkle proofs before trading, depositing, or transferring crypto?

Before you rely on Merkle proofs for deposits, withdrawals, or on‑chain checks, understand that a proof only shows membership relative to a trusted root and that the root’s provenance and chain finality matter. On Cube Exchange, use your deposit and trading workflows only after you check the chain’s finality rules and verify any provided inclusion proof or block header.

  1. Check the blockchain’s finality rule for the asset and note the confirmation count or finality signal you will wait for (e.g., 6 confirmations for many Bitcoin use cases, or a finalized epoch for Ethereum).
  2. Fetch the trusted root or block header for the deposit/transfer from a reputable source (a full node you control or a well-known block explorer) and record its block hash/root.
  3. If an inclusion proof is provided, verify it locally: hash the serialized leaf per protocol rules, then combine sibling hashes in the specified left/right order to reconstruct the root (use gettxoutproof for Bitcoin branches or eth_getProof for Ethereum account/storage proofs).
  4. Fund your Cube Exchange account via the deposit flow or fiat on‑ramp, wait for the confirmations/finality threshold you recorded in step 1, and only then place trades or initiate withdrawals that depend on that on‑chain state.

Frequently Asked Questions

How can a Merkle proof demonstrate that a key or item is NOT present in a committed dataset?
+
Ordered key-value trees can prove non-existence by returning the committed neighboring entries (showing where the missing key would sit), and ICS23 explicitly supports existence and non-existence proofs; simple unstructured binary trees cannot directly show absence because there is no canonical ‘gap’ to point at.
If I verify a Merkle proof, what other trust assumptions am I making?
+
You must trust the root you check the proof against — for example a block header accepted via consensus or a root signed by an authoritative party — because a valid proof only shows membership relative to that root and is meaningless if the root itself was attacker-chosen.
Can two implementations produce different Merkle roots for the same data, and why would that happen?
+
Different serialization or node-encoding rules can produce different roots from the same apparent data, so canonical encoding and explicit left/right ordering at each level are required; without those rules, implementations can disagree and proofs will not interoperate.
What breaks in a Merkle proof if the underlying hash function is no longer secure?
+
If practical collisions or second-preimage attacks become feasible against the chosen hash, the root’s binding to its leaves weakens and an attacker could forge alternate data that hashes to the same root; standards bodies like NIST describe the hash properties Merkle trees rely on and recommend secure hash choices.
Are Merkle proofs in Ethereum the same as the Merkle proofs used by Bitcoin?
+
No — Bitcoin uses a simple binary Merkle tree for transaction branches, while Ethereum’s state uses a modified Merkle Patricia trie whose proofs are sequences of serialized trie nodes (RLP-encoded) and follow different path logic; both apply the same commitment-plus-proof idea but with different node formats and verification rules.
Why are Merkle proofs so small and who bears the computational/storage cost for them?
+
Proof size grows roughly logarithmically with the number of leaves (one sibling hash per tree level), so verifiers do only a leaf hash plus one hash per level; the efficiency is achieved by shifting the work of building and storing the authenticated structure to the prover or full node rather than to every verifier.
Can a Merkle proof by itself prove that a transaction was valid or that the chain won’t reorganize?
+
A Merkle proof only proves that a value was included under a particular committed root; it does not prove the transaction’s semantic validity, freshness, or that the block/header is canonical — those questions are answered by consensus, signatures, or the verifier’s header-selection logic.
Why hasn't a single universal Merkle-proof format become standard across blockchains and databases?
+
Different authenticated data structures expose different verification invariants, so a single universal byte-level proof format is hard: ICS23 standardizes proofs for many key-value merklized stores but explicitly notes it does not naturally support Ethereum’s Patricia trie without extra per-database logic.

Your Trades, Your Crypto