Cube

What is a Hash Commitment?

Learn what hash commitments are, how they provide hiding and binding, why plain hashing is often insufficient, and how commit–reveal schemes work.

What is a Hash Commitment? hero image

Introduction

Hash commitments are a way to lock in a value now and reveal it later. They matter because many cryptographic protocols need exactly that: a participant must choose something before seeing everyone else’s choice, but should not have to reveal it immediately. If you cannot hide the choice, others can react to it. If you cannot lock it in, the chooser can change it later. A commitment scheme exists to give you both properties at once, as much as the underlying assumptions allow.

The basic puzzle is easy to state and surprisingly deep. Suppose Alice wants Bob to pick heads or tails for a coin flip over a network. Bob should choose first, so Alice cannot adapt to his choice. But if Bob simply sends “heads,” Alice learns it too early. If Bob waits and reveals later, Alice has no proof he did not change his mind after seeing her response. A commitment solves this by giving Bob a digital equivalent of a sealed envelope: he sends something now that fixes his choice, but does not yet reveal it.

A hash commitment is the simplest and most widely used way to build that envelope. Bob combines his message with some randomness, computes a cryptographic hash, and sends only the digest. Later he reveals the original message and the randomness. Anyone can recompute the hash and check that it matches. If the hash function behaves as expected, Bob cannot easily open the same commitment to two different messages, and observers cannot efficiently recover the message from the digest alone.

That sounds almost too simple, and that is exactly where misunderstandings begin. A plain hash of a message is often not a good commitment. Hiding may fail if the message comes from a small set, like “yes/no” or a short bid. Binding depends on collision resistance, but practical safety also depends on how you encode inputs, how much randomness you add, whether you truncate outputs, and whether you separate one application domain from another. The useful idea is simple; the safe construction is more careful.

What are hiding and binding in a commitment scheme?

PropertyWhat it preventsSecurity basisTypical failureDesign fix
HidingLearning value before revealFresh randomness / preimage resistanceSmall message spaces or reused saltAdd fresh high-entropy r; salt inputs
BindingChanging value after commitCollision resistance of hashHash collisions or ambiguous encodingUse strong hash; unambiguous encoding
Figure 35.1: Hiding vs binding: what each enforces

A commitment scheme is built to create an asymmetric timeline. Before the reveal, the commitment should tell the verifier essentially nothing useful about the committed value. After the reveal, the committer should be stuck with what they chose earlier. These goals are called hiding and binding.

Hiding means that seeing the commitment should not let an observer learn the committed message, except perhaps what inevitably leaks from conventions such as length if the construction does not conceal it. This property matters because the whole point of committing early is to withhold the actual choice until the right moment. If the committed value is a coin flip outcome, a password guess, a sealed bid, or a secret witness in a proof, even partial leakage can be enough to break the protocol’s purpose.

Binding means that once the commitment has been published, the committer should not be able to open it in two different ways. In other words, there should not be an efficient way to produce a commitment c together with two different openings that both verify. This is what makes the commitment a lock rather than a vague promise.

These two goals pull in different directions. Hiding wants the commitment to reveal as little structure as possible. Binding wants it to be rigid enough that only one message matches. Different commitment constructions balance these in different ways. Hash commitments usually aim for computational security on both sides: hiding because the randomness makes brute force infeasible, and binding because finding alternate openings would require breaking the hash function’s collision resistance or a closely related property.

The sealed-envelope analogy helps explain the purpose, but it has limits. A physical envelope is information-theoretically hiding if it is opaque, and physically binding if tamper evidence works. A hash commitment is different: both properties rest on assumptions about computation and implementation. If the message space is tiny, hiding may fail even with a strong hash. If the hash function or input encoding is flawed, binding may fail in practice even when the abstract idea is sound.

How does a hash commitment use randomness to hide and lock a message?

ConstructionHidingBindingWhen to use
Plain H(m)Fails for small domainsOnly collision-basedOnly when m is high-entropy
Salted H(encode(m,r))Good if r is high-entropyCollision resistance requiredDefault noninteractive commitment
HMACr(m) / KMACGood if r/key secretBinding via MAC/PRF propertiesWhen keyed secrecy or PRF needed
TupleHash / cSHAKEDepends on salt and rUnambiguous tuple encodingStructured tuples / domain separation
Figure 35.2: Practical hash commitment constructions compared

The most important idea in a practical hash commitment is this: you usually do not commit to m by publishing H(m). You commit by choosing fresh random data r and publishing something like c = H(encode(m, r)), where H is a cryptographic hash function, m is the message, and r is a nonce or salt revealed later.

Why add r? Because a deterministic hash of the bare message is only as hidden as the message space is large. If Bob commits to one of two possibilities, H("heads") or H("tails"), Alice can compute both hashes immediately and learn his choice. The hash has not hidden anything; it has only compressed it. The randomness changes the problem. Now Alice would need to guess both m and r, and if r is long and fresh, exhaustive search becomes infeasible.

Why reveal r later? Because the verifier needs a way to check the opening. When Bob eventually reveals (m, r), Alice computes the same encoding, hashes it, and compares the result to c. If they match, the opening is accepted. The commitment phase and opening phase are mechanically simple, which is why hash commitments are so attractive in both protocol design and software.

A worked example makes the mechanism clearer. Imagine a sealed-bid auction where Bob wants to commit to a bid of 37 without revealing it during the bidding period. If he publishes H(37), anyone can hash all plausible bids and recover his value. So Bob instead samples a fresh 256-bit random string r, encodes the pair (37, r) unambiguously, computes c = H(encode(37, r)), and publishes c. During the reveal phase, he publishes 37 and r. The auction contract or auctioneer recomputes the hash and verifies that the result equals c. Bob could not plausibly have searched for a second valid opening after the fact if the hash is collision-resistant, and observers could not plausibly have guessed r in advance.

That example also shows a subtle point: the commitment is not hiding because hashes are “encryption-like.” It is hiding because a fresh random r enlarges the effective search space. The hash mainly gives you a compact, publicly checkable checksum that is hard to fake in a second way.

Which hash properties (collision, preimage) provide binding and hiding?

To understand hash commitments from first principles, it helps to separate the two security jobs the hash is involved in.

For binding, the key intuition is collision resistance. If a committer could find two different encodings x and y such that H(x) = H(y), they could publish that shared digest as the commitment and later choose which opening to reveal. In the common construction where x = encode(m, r), this would mean two distinct pairs (m, r) and (m', r') opening to the same c. So the inability to find collisions is what makes “I already committed” meaningful.

For hiding, collision resistance is not enough. A function can be collision-resistant and still leak information about its input. Halevi and Micali make this point sharply: naive ideas like sending MD(M) or even MD(M || R) do not automatically give good secrecy from collision-freeness alone, because the digest may still reveal structural information about M. In practical engineering, the usual defense is to use strong modern hash functions together with sufficient fresh randomness and careful encoding. But the conceptual lesson is important: binding and hiding come from different aspects of the construction, not from one magic property called “secure hash.”

This is why textbooks distinguish several hash properties rather than treating “hash security” as a single blob. Collision resistance is central for binding. Preimage resistance matters for brute-force recovery, though for hiding the dominant issue is often the size of the message space combined with whether salting is used. In some constructions, weaker notions such as target collision resistance can be relevant. The exact property you need depends on the protocol and the adversary model.

NIST’s guidance gives a practical way to think about parameter sizes. If a hash output has L bits, then its expected collision resistance is about L/2 bits. So a 256-bit digest gives roughly 128 bits of collision resistance. That matters directly for binding. If you truncate the output to λ bits, the collision resistance falls to about λ/2 bits. A commitment that keeps only 128 bits of a digest is not “128-bit binding”; it is closer to 64-bit collision security. That distinction is easy to miss and costly to ignore.

When is publishing H(m) insecure as a commitment?

The most common misunderstanding is to assume that because a hash output looks random, it must hide the input. That is false whenever the possible messages are guessable.

Suppose a smart contract asks users to commit to whether they want option A or option B. If the commitment is just keccak256("A") or keccak256("B"), anyone can precompute both values. The commitment becomes public plaintext in disguise. The same problem appears for small bids, yes/no votes, short passwords, game moves, lottery numbers, and many zero-knowledge witnesses drawn from small domains.

Adding a random nonce fixes the basic problem, but only if the nonce is fresh and secret until reveal. Reusing the same r across commitments can create linkability. If the same message is committed twice with the same randomness, the commitments match. Even if the messages differ, reuse can create structure an attacker can exploit. Fresh randomness is not cosmetic metadata; it is part of the security mechanism.

There is also a formatting problem. The verifier checks whether the revealed opening reproduces the exact committed bytes. So encode(m, r) must be unambiguous. Simple concatenation can go wrong when different pairs produce the same byte string, such as (ab, c) and (a, bc) under a naive encoding. This is why structured encodings, length prefixes, or tuple-hashing constructions matter. NIST SP 800-185’s TupleHash is built precisely to avoid ambiguity when hashing a sequence of strings.

A related practical issue is domain separation. If the same hash function is reused across different purposes (say commitments, signatures, Merkle tree leaves, and user identifiers) you do not want two distinct application domains to accidentally share the same hash namespace. cSHAKE’s customization string S exists to separate domains cleanly. The principle is simple: when two hashes mean different things, make them syntactically different before hashing.

How are commitment schemes defined and proven (hiding/binding games)?

Once the intuition is in place, the abstract interface is straightforward. A commitment scheme has a commit algorithm and an open/verify algorithm.

The commit algorithm takes a message m and randomness r, and outputs a commitment c plus whatever opening data must later be revealed. In a simple hash commitment, the opening data is just m and r, and c = H(encode(m, r)).

The verify algorithm takes c and a proposed opening (m, r). It recomputes H(encode(m, r)) and accepts if the value equals c.

Security is then stated as two games. For hiding, an adversary should not be able to distinguish commitments to two chosen messages better than allowed by the security definition. For binding, an adversary should not be able to output one commitment and two distinct valid openings for it. In some papers the hiding guarantee is computational; in others it is statistical. Halevi and Micali study a non-interactive string commitment with secrecy distance bounded by about 2^{-k} under their model, showing that stronger secrecy statements are possible with more structure than the naive salted-hash approach.

This formal framing clarifies what is fundamental and what is convention. The two-phase interface and the hiding/binding goals are fundamental. The choice to instantiate them with SHA-256, Keccak, HMAC, cSHAKE, or a field-friendly hash is a design choice driven by context.

How does a Merkle root act as a commitment to a dataset?

A single hash commitment locks one message. A Merkle tree extends the same idea to an ordered collection of messages. You hash each leaf, combine neighboring hashes up the tree, and publish the root. That root is a compact commitment to the entire dataset.

The mechanism matters more than the vocabulary. If one leaf changes, the hash on that branch changes, which changes its parent, and so on until the root changes. So the root binds the whole structure. Yet the tree also gives efficient selective opening: to prove that one leaf belongs in the committed set, you reveal only the leaf and the sibling hashes along the path to the root. The verifier recomputes upward and checks that the same root appears.

This is why Merkle roots show up everywhere in blockchains and distributed systems. A block header can commit to all transactions in the block. An allowlist contract can commit to all eligible accounts with a single root stored on-chain. A zero-knowledge system can commit to a large witness or state table and later prove facts about selected positions. The root is just a hash commitment, but one whose internal structure supports succinct proofs.

Implementation details still matter here. OpenZeppelin warns against certain leaf encodings, including using 64-byte raw leaves with sorted-pair tree constructions under keccak256, because a concatenation of two internal nodes may be misinterpreted as a leaf. That is not a failure of the Merkle idea; it is an encoding ambiguity. Again, the real mechanism is “commit to a precisely encoded structure,” not “hashes are magical.”

How do commit–reveal patterns work in smart contracts and what can go wrong?

In blockchain systems, hash commitments often appear as commit–reveal patterns. A user first submits a hash commitment on-chain, then later reveals the underlying value. This is used to reduce front-running in auctions, games, naming systems, and randomness-dependent actions.

The logic is simple. During the commit phase, the mempool can see that a user is participating, but not what they chose, because only the digest is public. After the commitment is included in a block, the user reveals the preimage. Since the committed value was hidden at the time others might have reacted, the protocol gets a chance to preserve fairness.

But the mechanism only works if the commitment binds the right thing. A notable ENS audit found that a commit/register flow intended to prevent front-running did not bind the commitment to a specific owner. That meant an attacker who saw the reveal transaction could reuse the same revealed data with their own preferred recipient and race the original user. The lesson is sharp: the commitment must cover every field whose later substitution would matter. If ownership matters, bind the owner. If contract address, chain, round number, or sender matters, bind those too.

This is where domain separation and structured encoding stop being abstract hygiene and become protocol correctness. In a smart contract, a commitment should often be over a typed tuple such as (action, user, asset, amount, nonce, chain_id), not just over the obvious “secret” field. EIP-712, while designed for typed-data signing rather than commitments directly, reflects the same principle: the meaning of hashed data must be explicit.

Commit–reveal also has unavoidable costs. It requires at least two transactions and time between them. That increases gas cost, latency, and UX complexity. It helps against mempool-based observation, but it does not remove every ordering issue in systems with sophisticated block building or private order flow. A hash commitment solves a narrow timing problem well; it does not solve all MEV.

Which hash should I use for on‑chain checks vs inside SNARK/STARK proofs?

Hash / ClassVerification environmentCircuit costBest for
SHA-256 / KeccakSoftware and on-chain EVMExpensive in arithmetic circuitsGeneral on-chain commits, signatures
Poseidon / Reinforced ConcreteArithmetic ZK circuitsLow prover constraint costSNARK/STARK-friendly commitments
cSHAKE / KMAC (XOF/Keyed)Software with domain needsModerate (depends on use)Domain separation or keyed commits
Figure 35.3: Which hash to use for commitments

For ordinary software and most on-chain uses, commitments are often built from standard hashes such as SHA-256 or Keccak-256. The reason is straightforward: these functions are widely implemented, heavily analyzed, and efficient on conventional hardware or in the target execution environment.

If the commitment will be checked inside a zero-knowledge proof circuit, the cost model changes. Standard bit-oriented hashes can be expensive inside arithmetic circuits over prime fields. That is why systems in the SNARK ecosystem often use field-friendly hashes such as Poseidon. Poseidon is designed as a sponge over GF(p) and aims to reduce circuit constraints compared with alternatives like Pedersen hash in that environment. The point is not that hash commitments become a different concept in zero-knowledge systems. The point is that the same commitment interface can be instantiated with a different hash because the dominant cost has moved from CPU time to proof constraints.

This helps explain an apparent contradiction beginners sometimes notice. In one setting, people say “use SHA-256 or Keccak.” In another, they say “use Poseidon.” Both can be right. The commitment abstraction is stable; the efficient hash primitive depends on the computational substrate.

When should I use hash commitments vs Pedersen or KZG commitments?

It is useful to contrast hash commitments with neighboring constructions at the moment the comparison becomes meaningful. A hash commitment is attractive because it is simple, non-interactive, and rests on familiar hash assumptions. But it does not give every feature you might want.

A Pedersen commitment, for example, is built from group operations rather than hashes. It is especially important because it is information-theoretically hiding with fresh randomness while being computationally binding under discrete-log assumptions. It also has an additive homomorphism that hash commitments usually do not have. That algebraic structure is why Pedersen commitments show up in confidential transactions and many zero-knowledge protocols.

At the other end, polynomial commitments such as KZG are designed for a different job: committing to a polynomial and later proving evaluations succinctly. If you need “I committed to this value and can reveal it later,” a hash commitment may be enough. If you need “I committed to a function and can prove many local facts about it efficiently,” you are in a different part of the commitment landscape.

So the right question is not “which commitment is best?” but “what exact invariant must the commitment preserve?” If the invariant is simply hidden-now, fixed-later, a hash commitment is often the simplest answer.

How are hash‑commitment steps used in threshold signing and MPC?

Commitments often appear inside larger protocols where multiple parties must coordinate without trusting one another fully. In threshold signing and multi-party computation, parties frequently commit to intermediate values before revealing or combining them, precisely to stop adaptive cheating.

A concrete real-world example is Cube Exchange’s decentralized settlement design, which uses a 2-of-3 threshold signature scheme: the user, Cube Exchange, and an independent Guardian Network each hold one key share; no full private key is ever assembled in one place, and any two shares are required to authorize a settlement. The threshold-signing mechanism itself is not “a hash commitment,” but protocols of this kind routinely depend on commitment-style steps so one party can lock in a nonce share, message-related value, or intermediate transcript component before learning the others’ values. The same hidden-now, fixed-later logic is what prevents one participant from adaptively choosing a contribution after seeing everyone else’s.

That connection is worth seeing because it shows what commitment schemes are really for. They are less about storing secrets than about enforcing fair sequencing in distributed protocols.

What practical rules prevent hash‑commitment failures (encoding, salt, domain separation)?

Most failures of hash commitments do not come from exotic breaks of SHA-256. They come from getting the surrounding structure wrong.

Use fresh, high-entropy randomness for every commitment when hiding matters. If the message space is small and you omit randomness, assume the commitment is readable. If you reuse randomness, assume commitments may become linkable.

Encode data unambiguously. Do not rely on raw concatenation unless lengths or boundaries are explicit. If you are hashing a tuple, treat it as a tuple in the encoding, not as an accidental string.

Bind every field that matters to later correctness. If a reveal should only be valid for a specific user, contract, round, or chain, include those values in the commitment preimage. Otherwise someone may replay or repurpose the opening in a context you did not mean.

Choose output length with collision security in mind. Truncation weakens binding faster than many people expect because collision resistance scales like half the output length. If you want roughly 128-bit binding strength from a standard hash-based commitment, a 256-bit digest is the natural baseline.

Use domain separation when the same underlying hash function serves multiple roles. cSHAKE customization strings, explicit prefixes, or typed encodings can all serve this purpose. What matters is that two semantically different uses are not fed into the hash in the same format.

And finally, remember what assumption is carrying which guarantee. The random salt helps hiding. Collision resistance helps binding. Clear encoding prevents ambiguity. These are separate pieces of the mechanism.

Conclusion

A hash commitment is a compact way to make a value hidden now and checkably fixed for later. The essential construction is to hash an unambiguous encoding of the message together with fresh randomness, then reveal both later for verification.

What makes the idea click is that a commitment is not “just a hash.” It is a timing tool. The hash provides a short public handle; the randomness protects secrecy; the encoding defines exactly what was locked in; and the hash function’s collision resistance supports the claim that the value cannot be changed afterward. If you remember that division of labor, most of the design choices around hash commitments become much easier to reason about.

What should you understand before using this part of crypto infrastructure?

Hash Commitments should change what you verify before you fund, transfer, or trade related assets on Cube Exchange. Treat it as an operational check on network behavior, compatibility, or execution timing rather than a purely academic detail.

  1. Identify which chain, asset, or protocol on Cube is actually affected by this concept.
  2. Write down the one network rule Hash Commitments changes for you, such as compatibility, confirmation timing, or trust assumptions.
  3. Verify the asset, network, and transfer or execution conditions before you fund the account or move funds.
  4. Once those checks are clear, place the trade or continue the transfer with that constraint in mind.

Frequently Asked Questions

Why do hash commitments include a random nonce r instead of just hashing the message m?
+
Fresh randomness (a nonce or salt) is needed because H(m) alone only compresses m and leaks m whenever the message space is guessable; publishing c = H(encode(m,r)) forces an attacker to search the much larger space of (m,r) instead of m alone, so hiding depends on r being long and secret until reveal.
What goes wrong if the message space is tiny—can't a hash still hide the value?
+
If the possible messages are small (e.g., yes/no, short bids, or few moves) an adversary can precompute H(m) for every candidate and learn the committed value, so hiding fails unless you enlarge the search space (for example by adding fresh randomness).
How should I encode message and randomness so different openings can't collide due to formatting?
+
You must encode the tuple unambiguously (length prefixes, structured tuple-hashing, or TupleHash/cSHAKE-style approaches) because naive concatenation can make different (m,r) pairs produce the same byte string and thus allow equivocation or accidental acceptance of wrong openings.
If I shorten the hash output to save space, how does that affect security?
+
Truncating a digest weakens binding disproportionately: an L-bit output gives about L/2 bits of collision resistance, so truncating to λ bits reduces collision resistance to roughly λ/2 bits and can quickly make collisions feasible if chosen too small.
Can I reuse the same nonce r for many commitments to save storage or gas?
+
Reusing the same randomness across commitments creates linkability: identical (or structurally related) messages committed with the same r produce correlated commitments that an adversary can detect or exploit, so use fresh, high-entropy r for every commitment when hiding matters.
When using commit–reveal in smart contracts, what else besides the secret should I include in the commitment?
+
Commitments must include every field whose substitution would change correctness—if ownership, target contract, chain id, sender, or round number matter, bind them into the committed tuple; failing to do so enabled replay/front‑running attacks in real ENS commit–reveal flows.
Why do some systems recommend Poseidon or other exotic hashes instead of SHA‑256 for commitments?
+
Choose the hash primitive to match where verification happens: use standard hashes like SHA‑256/Keccak for software or on‑chain checks, but prefer field‑friendly hashes (e.g., Poseidon) inside SNARK/STARK arithmetic circuits because they are far cheaper in that cost model.
Is a Merkle root just a big hash commitment, and are there pitfalls when building Merkle trees?
+
A Merkle root is a commitment to an ordered dataset, but leaf/node encoding matters: certain encodings (for example 64‑byte raw leaves with some sorted-pair constructions) can make concatenations ambiguous and allow malleability, so follow recommended leaf encodings to avoid misinterpretation.
When should I use a hash commitment versus Pedersen or KZG/polynomial commitments?
+
Hash commitments are simple and non‑interactive, but they are not the only option: Pedersen commitments are information‑theoretically hiding (with randomness) and computationally binding under discrete‑log assumptions and support algebraic homomorphism, while polynomial commitments (KZG, IPA) are tailored for succinct evaluation proofs—pick the primitive that matches the protocol invariants you need.
Does commit–reveal fully stop front‑running on blockchains, and what are the practical costs?
+
Commit–reveal reduces some mempool front‑running risk but has costs and limits: it requires at least two transactions (higher gas and latency), does not eliminate all MEV or sophisticated ordering attacks, and its effectiveness depends on reveal delays and the surrounding execution model.

Your Trades, Your Crypto