Christian

Posted on Mar 03, 2022Read on Mirror.xyz

Bitcoin Archaeology

One the apps that I built to interface with a cryptocurrency or blockchain and have since lost access to was a tool that could take any arbitrary text input and push the results of it to the Bitcoin blockchain by way of the OP_RETURN opcode. This was a college school side project that I built during a time when my interest in cryptocurrency was still primarily centered around Bitcoin and what could be done with it. At that time I was also interested in building desktop applications and decided that crypto tools and applications were the easiest way to combine these two interests. I did complete the application (although the source is nowhere to be found) and distinctly remember successfully pushing a single message to the blockchain, sometime in 2018: it read ‘far_above_the_neighboring_hilltops’, which was the opening line to the alma mater of my high school. This week I went on a journey to recover this message and find the block in which it was written after spontaneously remembering its existence in the middle of the night.

Bitcoin, unlike Ethereum, does not have a turing-complete execution environment built into its client. It is not a state machine, in other words. Instead, Bitcoin comes with a lightweight programming language called script, which can be used to construct semi complex transactions in a stack-based manner similar to Forth and create simple transaction logic. Some familiar examples of script programs include regular transactions (pay to public key hash), pay to many transactions, multisignature transactions and timelocked transactions. To give an example of what these simple programs look like, from the bitcoin wiki, a standard P2PK transaction looks like this:

scriptPubKey: OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG
scriptSig: <sig> <pubKey>

Bitcoin is not turing-complete on purpose. Turing-completeness implies that there is not a deterministic amount of time and computational power that is required to complete the execution of a particular program. This creates problems for decentralized networks, because a single faulty program, such as one that uses infinite recursion, could cause everything to come to a complete halt. Ethereum solves this problem via the gas limit; within the EVM system every program’s computational consumption is measured in units of gas and any program that exceeds a certain limit of computation is automatically terminated. This means that developers have to be careful not to make their programs too complex, or they will not work on the network. Periodically this limit is increased to keep up with the evolving state of dapps and hardware. Bitcoin, on the other hand, avoids this problem altogether by simply not allowing Turing complete programs within its system (natively). Thus there is no need for a gas limit, because a node can know before execution exactly how much computational effort will be needed to complete the execution of the program.

OP_RETURN is an opcode that can be used to create transactions. It makes a transaction provably unspendable (invalid) and thus can be used to burn coins. An additional feature of this opcode is that one can attach up to 40 bytes of arbitrary data on the end and have this information written to the blockchain forever. The core devs have always been hesitant about this opcode because the intention of Bitcoin has never been to become an arbitrary data storage layer. However OP_RETURN was considered an improvement to previous schemes employed to store data on chain and was meant to keep the overall growth of Bitcoin in check by allowing these unspendable outputs to be pruned. Bitcoin core release 0.9.0 included the note:

“This change is not an endorsement of storing data in the blockchain. The OP_RETURN change creates a provably-prunable output, to avoid data storage schemes – some of which were already deployed – that were storing arbitrary data such as images as forever-unspendable TX outputs, bloating bitcoin's UTXO database. Storing arbitrary data in the blockchain is still a bad idea; it is less costly and far more efficient to store non-currency data elsewhere.”

The reluctance to encourage data storage in this way (which would later be encouraged by some forks of Bitcoin including BSV) was for two primary reasons: 1) chain bloat, increasing the barrier to entry of running a node faster than normal, and 2) Bitcoin was created as a transaction ledger, so generalized data storage falls outside the scope of the network. Today, we have far better storage options like Filecoin and Arweave to use for this purpose. Back then, however, Filecoin was still a remnant of the ICO boom, nowhere near mainnet and Arweave had yet to come into being. Being intrigued by this concept of permanent storage, I created a simple app in Java that ran on top of a BitcoinJ (the Java implementation of Bitcoin core) light client and constructed and broadcast the simplest version of an OP_RETURN tx based on some text input, then returned the tx hash to the user. I sent my transaction, which cost me about $3 in fees and forgot about it for the next 3 years.

How to find such a transaction many years later? Block explorers generally do not index by data inside of a transaction. I would need to know the exact block number in order to narrow down my results. After doing some searching on the internet, I came across Bitcoin-Data-Parser, which runs on top of Bitcoin core and indexes precisely the data I need: OP_RETURN transactions! All I needed was a Bitcoin node, an installation of the program, a Postgresql database, and a web browser.

First, syncing a full node. I have a dual raspberry pi node setup that is built out of a wooden box that Budweiser sold promotional special edition beers out of. I came to own one while working at a liquor store a few years ago, noticing that the bottles were years past drinkable, and asked if I could just keep the box. I filed down some plastic pi racks to fit inside, mounted a fan and a drive and it became my home server. Sadly, whilst moving I jostled the harddrive and corrupted it, forcing me to resync the bitcoin storage directory.

moments before the sad discovery that the drive at the bottom of the screen no longer had readable data on it

The Bitcoin-Data-Parser program just needs an RPC endpoint of a full node to look at, so I used the local network ip and built the neat TypeScript parser program on my laptop. Then, you just pass it the data you want to search for via its API that it sets up via Express inside your web browser, and the results come back to you. After some searching, bingo! I found the block I was looking for. Here is the actual transaction, pushed in December 2018:

The transaction has been found!

In some ways it feels odd to recover a relic of the past by sifting through data that is on-chain. Like an archaeologist digging through layers of clay and silt, the blockchain provides us with another (digital) record that we may sift through to answer questions about the past. To me, this is the most important aspect of this technology. But what’s hardest to convey about the technology that we work with every day to normies is the degree of permanence of it; more than 10,000 computers around the world store this information and have proven that it is valid. Not compelled to do so, but choosing to voluntarily maintain this great store of transaction (and other) data in perpetuity. Despite the churn of operators entering and exiting, the decentralized nature of Bitcoin and other blockchains allows for a coherent truth to be maintained throughout time, without the intervention of a central authority. All of us know this, but in most cases I believe we fail to articulate its importance. A simple example is the best way to get the point across because in many cases it can have the most profound impact. For me, this transaction represents the beginning of the end of any career aspirations outside of crypto; a few months later I would be doing my thesis, sitting in a telescope operation room around 4 am in the middle of the desert. At that moment I realized I couldn’t do traditional science anymore, and it was time to double down on crypto. Everyone’s double-down moment is encoded somewhere, as some type of transaction, on one chain or another.

Bitcoin