An introduction to evm-trace, a fast and correct Python library to work with Ethereum Virtual Machine traces.
There are two common trace formats, which we’ll refer to as Geth-style and Parity-style traces. Geth traces provide opcode-level resolution, while the most commonly used Parity trace format provides a call tree.
Our goal today would be to make opcode-resolution traces 100x faster, making them from a little-known Parity trace format. We’ll also make them compatible with Geth traces so they can be used as a drop-in replacement.
Geth traces return frames containing the full state of the virtual machine at each step of the execution, including the program counter, opcode, remaining gas, gas cost of the operation, call depth, items on the stack, a snapshot of the memory, and a storage diff.
Each subcall starts with a fresh and separate memory, stack, and storage changes.
It’s a powerful but a bit unwieldy format, since you need to do extra work to know things like what’s the current contract being called or what’s the calldata.
I started digging into this when I stumbled upon several transactions that crash Erigon if you request a debug trace. It appeared that they generated very large traces, and soon enough I was able to confirm why when Erigon re-enabled streaming for heavy RPC endpoints.
In Geth format each frame contains the full copy of the EVM memory, which can lead to traces that don’t fit in the system memory. A contract expanding the memory to 100kb could generate traces as large as 70 GB.
This can be addressed by streaming and iteratively decoding the response as you get it from the Ethereum client. This is what ape does by default with Geth traces when you call
If we look up the memory expansion gas cost formulas at evm.codes, it becomes apparent that EVM memory is not that expensive and we can possibly construct or find transactions with even larger traces.
After sampling peak memory usage over 10,000 most recent blocks, we find even larger traces. For one of those, Erigon hangs up on us after 30 minutes, after returning 460 GB and 898,836 frames.
We can patch Erigon to increase the write timeout, and learn that the full trace consists of 960,198 frames, takes up 524 GB and 34m7s to fetch, before deserialization or processing. We can calculate that we got this trace at 275 MB/s and 469 frames/s.
Keep that number in mind, because with a method presented below, we can recompute the same trace using 137 MB of data (3916x less), which we can fetch in 5.6s, deserialize in 7.1s and replay in 4.6 seconds, with full processing taking 117x less time than just fetching a Geth trace.
Parity traces are quite different from Geth, the ad hoc traces come in three different flavors:
Parity traces are supported in Erigon, Nethermind, Besu, Foundry. They are not supported in Geth, Hardhat, Ganache. Anything beyond
tracetype requires an archive node.
trace is the most widely used mode, it returns a call tree, similar to what you can see on ethtx.info or when calling
receipt.show_trace() in ape. If you need a raw call tree, you can get it from
chain.provider.get_call_tree(tx). You can also provide a custom display class and only display the parts you care about.
stateDiff mode shows the changes of balance, code, nonce, and storage grouped by account.
A more interesting problem here is how to map storage keys to readable names like
balanceOf[account]. I’ve seen it solved in Tenderly, which is a proprietary tool.
Maybe you, the reader, could be the one to tackle this problem and release it as an open source tool for everyone to use.
vmTrace mode is one of the most enigmatic and rarely used, mainly because it was never really documented. This mode will be the main focus of this article and I’ll show you how you can make a drop-in replacement for Geth traces that requires 1000x less data and works 100x faster.
vmTrace returns a nested structure of incremental updates which we can use to replay the transaction, while tracking stack, memory, storage at each step of the execution. Of course, the incremental format doesn’t come for free and requires some legwork.
Here is what we get in a
VMTracerepresents a call and contains all subcalls
codeEVM bytecode to be executed
VMOperationto be executed
VMOperationrepresents a step of the execution
costgas cost of the operation
exthe result of the execution, could be
nullif the operation has reverted
idxindex in the call tree
VMExecutedOperationrepresents side effects of executing the operation
usedincorrectly named, shows the remaining gas
pushthe items to be placed on the stack
memthe memory delta
storethe storage delta
MemoryDiffrepresents a memory update
offmemory offset where to write the data
datathe bytes to write starting at the offset
StorageDiffrepresents a storage write
keystorage key to write to
valvalue to write
I defined this structure using msgspec to get fast deserialization and type validation.
Now we need to replay the transaction, applying the incremental updates and untangling the nested structure into a flat execution order.
I borrowed the real Stack and Memory implementations from py-evm, which is a great EVM reference itself.
I wrote a small harness to do pairwise comparison of each frame of my synthetic trace with Geth trace frames, which exited after the first divergence in stack or memory. I learned that Geth sends frames after memory expansion but before anything from
ex has been applied.
Memory was no problem, but the stack was missing a few important pieces. We had
push, but no info about the number of items popped from the stack by each opcode.
I’ve assembled this list by hand using this file from
ethereumjs as a reference. These opcodes pop a fixed amount of items:
These opcodes can be expressed as a formula:
LOG0-LOG4pops 2-6 items
SWAP1-SWAP16pops¹ 2-17 items
DUP1-DUP16pops¹ 1-16 items
¹ not really, this is only true for how the vmTrace format is served
Some transactions reconciled, but the stack still diverged, especially for large transactions. Using foundry’s debugger I found that the incremental updates have missed occasional pushes for several opcodes, namely
COINBASE. I fixed this in Erigon and the patch has already made it to
This finding has made me question a lot of things. Most people fully trust the data they get from the node, but Ethereum is a very complex system and bugs happen. It’s great having so many clients and references and different approaches to check on each other.
I’ve added full block tracing with opcode-level resolution and an option to return a
memoryview of the memory
bytearray instead of copying it for each frame. It works like a pointer and comes with a caveat that you must process the frame immediately after receiving it from the iterator, otherwise the value may be mutated. It should work for most streaming workflows with traces, since you are usually scanning for some specific frames and don’t look back.
I’ve benchmarked evm-trace against the largest traces I could identify. Most of them were replayed in under 3 seconds, with no significant slowdown even when a lot of memory is allocated. I’ve also benchmarked it against 1000 most recent Ethereum mainnet blocks. It was pleasant seeing the needle has moved from seconds per trace to traces per second.
When dealing with full blocks, you can expect to fetch a median of 27.4 transactions per second from the node and replay 108.8 transactions per second at around 252,000 frames per second. When requesting traces of individual transactions, your client replays all transactions in the block up to the requested transaction index to get to the correct state and discards all other traces. To give you the intuition, here is how many frames you would usually encounter in a block:
Give it a try
This lightning-fast drop-in replacement for Geth-style traces is available in the latest evm-trace and soon will be integrated into ape, your new favorite Ethereum development and data science framework.
- Rust web3 VMTrace type
- Erigon’s instructions.go and trace_adhoc.go
- EVM chapter from ethereumbook
- EVM. How does it work? Part 2.
- Getting Deep Into EVM: How Ethereum Works Backstage