Merkle trees are one of the areas of Bitcoin I found hardest to understand.
The biggest problem was every time I read about them I just saw diagrams that didn’t make any sense with lots of multiplication and layers. Ignore the terminology, don’t worry about leaves, branches, roots or trees – these are words concerned with the technical process of creating one and will just confuse you.
Perhaps more helpful than understanding exactly how a Merkle tree is created, is understanding why they are useful. For the techies, I apologise for the over simplifications.
If you haven’t already, check out my article where I explain hashing. Hashing is such a simple concept, and it has very cleverly been integrated into Bitcoin in a number of ways.
Basically, a hash is a way to convert any information at all into a completely random but consistent value.
There are many different hashing methods, Bitcoin uses SHA-256 which enables you to convert any data into a 256 bit (32 byte) value.
If I hash the text ‘seebitcoin.com’ I get the 256 bit hash: e37695b5a5f8671d24e4160ec69755f73061bbc01d319403ce5ba5034bd57dc0.
If you do the same you will get the same result. A tiny change and the entire hash will be different, for example ‘seebitcoin.nom’ always gives the hash: cebe400cd1a7319a6c1e881d85322120f03c95c4dc00d996b1ae3472398753fa.
This allows us to do some really useful things. You can take the entire contents of a Shakespeare book and convert it into a 256 bit hash. If you have a hash for all your Shakespeare books, and your friend gives you a hash for their favourite Shakespeare book, you can know whether you have exactly the same copy.
If anything had changed, even as small as say an uppercase letter becoming lower case, the hash would completely change and you could know for certain the books were not identical.
We could take the complete works of Shakespeare, and for each piece of work we could create a hash. This means to have enough information to be able to verify 884,647 words, we’d only need 196 hashes total (38 plays, 154 sonnets and 4 poems).
This verification information could be stored on a single page of paper, while the complete works of Shakespeare could fill a bookcase.
While people probably aren’t too worried about the words in their copy of Hamlet being tampered with, in the Bitcoin world people want to know with certainty that none of the previous transactions they have stored have been altered, as this would introduce the possibility of fraud.
Like the complete works of Shakespeare, the Bitcoin blockchain is huge. There are over 420,000 blocks which can contain as many as 1MB worth of transactions. Currently the blockchain is over 75GB in size.
Just like all the words in a Shakespeare book can be converted into a 256 bit hash, all the transactions that took place in a block can be converted into a single 256 bit hash. This hash, in Bitcoin terminology, is called the Merkle root (the final value when calculating a Merkle tree).
This means for 420,000 blocks, you could have enough information to verify with certainty every transaction with just 13MB of data. Considering that’s over 75GB worth of transactions this is extremely useful.
In a Bitcoin block, the header data (all the important information about the block) and the transaction data (all the actual transactions, the bulk of the data) are separate.
This means that the software only needs to use the data in the block headers to do most things, which is far more efficient. Another benefit is that your smart phone only needs to download the block header data (around 32MB for 420,000 blocks) to run a Bitcoin client, and can just request additional data as needed without having to download and store an entire copy of the 75GB blockchain.
Merkle trees are what makes these things possible, and are basically just a method to convert all of a block’s transactions into a single hash to be used to verify no information has been manipulated.
Any requests for other explanation articles? Let me know in the comments.
Correction: I erroneously suggested a hash was 256 bytes instead of bits, values have now been corrected.