A hash tree or Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes. Hash trees are useful because they allow efficient and secure verification of the contents of large data structures.
Hash trees are used in the IPFS file system, BitTorrent protocol, Git, Bitcoin, Ethereum, and a number of NoSQL systems like Apache Cassandra and Riak.
They are used for many kinds of verification, especially of large chunks of data.
Building a Merkle Tree
Say we’re building a file sharing system such as Bittorent, which involves sending blocks of data over the network. We want to verify the integrity and authenticity of our blocks. Blocks should arrive uncorrupted, and not altered by a malicious third party along the way.
How do we use the Merkle tree to efficiently do this?
The above diagram shows a binary Merkle tree. Each
Node in a binary tree has two children.
We have some blocks of data
A Merkle tree only contains hashes. To build our tree, we first create leaf nodes by hashing our data blocks with a
hash_function such as
SHA-2. This gives us 4 leaf nodes with values
Next, we need to recursively build parent nodes whose values are equal to the hashes of its immediate children’s values. At each step, the number of nodes we have to work with will halve until only a single root node remains.
We calculate the hash of parent node by hashing the concatenation of its children’s values:
hash(left_child.value <> right_child.value).
Below is an excerpt of the tree building code:
In the above code, we have a public
build method that creates the initial
leaves leaf nodes from our
blocks, and passes it into the
build_tree private function. The
build_tree function recursively builds parent nodes until only a single
root node is left, at which point
root is returned.
Study the recursion in
Note that the implementation above is more general as it can accept trees with any arity, not just binary. Instead of doing
left.value <> right.value, we use
Enum.reduceto concatenate any number of
Looking at the above tree, we can see that only the root hash is necessary to verify that all the data blocks
L1..L4 are valid. Consequently, the root hash can be sent through a trusted source.
We can then use this root hash to verify the integrity and authenticity of blocks. Once the root hash is available, the Merkle hash tree itself can be received from any non-trusted sources such as a peer in a P2P network. The received tree is then compared against the root hash. If the match fails, we move on to other peers and continue checking until we find a valid tree.
To verify if any individual data blocks are corrupted during transport or have been tampered with, we only need hashes on the
log(n) path of nodes from a corrupted leaf up to the root will differ. In other words, we only need to ensure that the hashes of adjacent siblings/uncles are the same along the way up the tree. If any don’t match, either the leaf node is invalid or a sibling node along the
log(n) path is invalid.
Application of Merkle proofs in Bitcoin
In Bitcoin, every transaction has a hash associated with it.
In a block, all of the transaction hashes in the block are themselves hashed (sometimes several times – the exact process is complex), and the result is the Merkle root. In other words, the Merkle root is the hash of all the hashes of all the transactions in the block. The Merkle root is included in the block header. With this scheme, it is possible to securely verify that a transaction has been accepted by the network (and get the number of confirmations) by downloading just the tiny block headers and Merkle tree – downloading the entire block chain is unnecessary.
Merkle trees allows for you to verify transactions as needed and not include the body of every transaction in the block header, while still providing a way to verify the entire blockchain (and therefore proof of work) on every transaction.
Application of Merkle proofs in IPFS
The InterPlanetary File System (IPFS) uses
merkledag - a more generalized form of a Merkle tree. You can read more about it here.
Appendix: Elixir library
I’ve written a
merkle_tree implementation in pure Elixir, making it easy for you to start using Merkle proofs in your applications.
merkle_tree to your list of dependencies in
You can create trees like so:
Appendix: Cryptographic hashing
merkle_tree pure Elixir implementation uses
SHA-2 as the default
hash_function used and calls Erlang’s
:crypto module for its hash implementations.
The Merkle hash tree is an incredible useful data structure used in a wide range of modern applications, such as peer-to-peer file sharing, cryptocurrencies, modern file systems, and more. They are used in many kinds of verification, especially of large chunks of data.
📬 Get updates straight to your inbox!
Subscribe to my newsletter to make sure you don't miss anything.