Merkle Trees in Elixir
Thursday, 19 May 2016 · 15 min read · elixirA hash tree or Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes. Hash trees are useful because they allow efficient and secure verification of the contents of large data structures.
Hash trees are used in the IPFS file system, BitTorrent protocol, Git, Bitcoin, Ethereum, and a number of NoSQL systems like Apache Cassandra and Riak.
They are used for many kinds of verification, especially of large chunks of data.
📬 Get updates straight to your inbox.
Subscribe to my newsletter so you don't miss new content.
Building a Merkle Tree
Say we’re building a file sharing system such as Bittorent, which involves sending blocks of data over the network. We want to verify the integrity and authenticity of our blocks. Blocks should arrive uncorrupted, and not altered by a malicious third party along the way.
How do we use the Merkle tree to efficiently do this?
The above diagram shows a binary Merkle tree. Each Node
in a binary tree has two children.
We have some blocks of data L1...L4
.
A Merkle tree only contains hashes. To build our tree, we first create leaf nodes by hashing our data blocks with a hash_function
such as SHA-2
. This gives us 4 leaf nodes with values hash(L1)...hash(L4)
respectively.
Next, we need to recursively build parent nodes whose values are equal to the hashes of its immediate children’s values. At each step, the number of nodes we have to work with will halve until only a single root node remains.
We calculate the hash of parent node by hashing the concatenation of its children’s values: hash(left_child.value <> right_child.value)
.
Below is an excerpt of the tree building code:
In the above code, we have a public build
method that creates the initial leaves
leaf nodes from our blocks
, and passes it into the build_tree
private function. The build_tree
function recursively builds parent nodes until only a single root
node is left, at which point root
is returned.
Study the recursion in build_tree
carefully.
Note that the implementation above is more general as it can accept trees with any arity, not just binary. Instead of doing
left.value <> right.value
, we useEnum.reduce
to concatenate any number ofchildren
.
Verification
Looking at the above tree, we can see that only the root hash is necessary to verify that all the data blocks L1..L4
are valid. Consequently, the root hash can be sent through a trusted source.
We can then use this root hash to verify the integrity and authenticity of blocks. Once the root hash is available, the Merkle hash tree itself can be received from any non-trusted sources such as a peer in a P2P network. The received tree is then compared against the root hash. If the match fails, we move on to other peers and continue checking until we find a valid tree.
To verify if any individual data blocks are corrupted during transport or have been tampered with, we only need hashes on the log(n)
path of nodes from a corrupted leaf up to the root will differ. In other words, we only need to ensure that the hashes of adjacent siblings/uncles are the same along the way up the tree. If any don’t match, either the leaf node is invalid or a sibling node along the log(n)
path is invalid.
Application of Merkle proofs in Bitcoin
In Bitcoin, every transaction has a hash associated with it.
In a block, all of the transaction hashes in the block are themselves hashed (sometimes several times – the exact process is complex), and the result is the Merkle root. In other words, the Merkle root is the hash of all the hashes of all the transactions in the block. The Merkle root is included in the block header. With this scheme, it is possible to securely verify that a transaction has been accepted by the network (and get the number of confirmations) by downloading just the tiny block headers and Merkle tree – downloading the entire block chain is unnecessary.
Merkle trees allows for you to verify transactions as needed and not include the body of every transaction in the block header, while still providing a way to verify the entire blockchain (and therefore proof of work) on every transaction.
Application of Merkle proofs in IPFS
The InterPlanetary File System (IPFS) uses merkledag
- a more generalized form of a Merkle tree. You can read more about it here.
Appendix: Elixir library
I’ve written a merkle_tree
implementation in pure Elixir, making it easy for you to start using Merkle proofs in your applications.
Add merkle_tree
to your list of dependencies in mix.exs
:
You can create trees like so:
Appendix: Cryptographic hashing
The merkle_tree
pure Elixir implementation uses SHA-2
as the default hash_function
used and calls Erlang’s :crypto
module for its hash implementations.
Conclusion
The Merkle hash tree is an incredible useful data structure used in a wide range of modern applications, such as peer-to-peer file sharing, cryptocurrencies, modern file systems, and more. They are used in many kinds of verification, especially of large chunks of data.
📬 Get updates straight to your inbox.
Subscribe to my newsletter so you don't miss new content.