Kakarot zkEVM 101 - CYNIC

TL;DR

A virtual machine (VM) is a software emulation of a computer system that provides an execution environment for programs. It can simulate various hardware devices and allow programs to run in a controlled and compatible environment. The Ethereum Virtual Machine (EVM) is a stack-based VM used to execute Ethereum smart contracts.

zkEVM is an EVM that incorporates zero-knowledge proofs/validity proofs technology. It allows for the verification of the EVM execution process using zero-knowledge proofs without requiring all validators to re-execute the EVM. There are various zkEVM products in the market, each with its own approach and design.

The need for zkEVM arises from the demand for a virtual machine on Layer 2 that supports the execution of smart contracts. Additionally, some projects choose to use zkEVMs to leverage the extensive user ecosystem of the EVM while designing instruction sets that are more friendly to zero-knowledge proofs.

Kakarot is a zkEVM implemented on Starknet using the Cairo language. It simulates the EVM's stack, memory, execution, and other aspects as Cairo smart contracts. Kakarot faces challenges related to compatibility with Starknet's account system, cost optimization, and stability due to the experimental nature of the Cairo language.

Warp is a translator that converts Solidity code into Cairo code, providing compatibility at the high-level language level. Kakarot, on the other hand, provides compatibility at the EVM level by implementing EVM opcodes and precompiles.

What is a virtual machine?

To explain what a virtual machine is, we first need to understand the execution process of computers under the mainstream von Neumann architecture. Various programs running on a computer are typically written in high-level languages and undergo multiple layers of transformation to ultimately generate machine code that can be executed. Based on the different ways of converting to machine code, high-level languages can be roughly divided into compiled languages and interpreted languages.

A compiled language refers to a language that, after the code is written, needs to be processed by a compiler to convert the high-level language code into machine code and generate an executable file. It can be executed multiple times with high efficiency after a single compilation. The advantages of compiled languages are that they are fast in execution because the code is already converted to machine code during compilation. They can also run programs in an environment without a compiler, making it convenient for users as they don't need to install additional software. Common compiled languages include C, C++, Go, etc.

In contrast to compiled languages, we have interpreted languages. Interpreted languages execute code line by line through an interpreter, running directly on the computer. Each time the program runs, it needs to be retranslated. The advantages of interpreted languages are high development efficiency and easy code debugging, but they are relatively slower in execution speed. Common interpreted languages include Python, JavaScript, Ruby, etc.

It's important to note that languages do not fundamentally distinguish between compiled and interpreted. They may have certain tendencies during their initial design. For example, C/C++ is mostly compiled, but it can also be interpreted (e.g., Cint, Cling). Many traditionally interpreted languages now compile into intermediate code and execute on a virtual machine (e.g., Python, Lua).

Now that we understand the execution process of a physical machine, let's talk about virtual machines.

Why do we need virtual machines?

Firstly, a virtual machine provides an execution environment. In the case of Ethereum, in order to achieve its initial vision of a "world computer," it needs a Turing-complete execution environment to support programming. The reason for using a virtual machine instead of directly converting to machine code and executing it on different machines may be due to the following reasons:

Compatibility: Using a virtual machine, similar to the JVM, allows for better compatibility across different systems.
Controlled execution environment: With a virtual machine, every instruction is interpreted and executed by the virtual machine, which means that each step of the program's execution can be controlled. The virtual machine acts as a sandbox environment, providing benefits such as permission control and debugging. Especially in terms of permission control, it can prevent malicious programs from damaging the underlying system.
Gas support: One of the most important reasons is that the native system may not support Gas, which is crucial for maintaining the operation of the system and preventing DDoS attacks.

A virtual machine typically provides a virtualized computing environment by simulating various hardware devices. Different virtual machines may simulate different hardware devices, but they generally include a CPU, memory, disk, network interface, etc.

Taking the Ethereum Virtual Machine (EVM) as an example, the EVM is a stack-based virtual machine used to execute Ethereum smart contracts. The EVM provides a virtual computing environment by simulating hardware devices such as CPU, memory, storage, and stack.

Specifically, the EVM is a stack-based virtual machine that uses a stack to store data and execute instructions. The EVM's instruction set includes various opcodes, such as arithmetic operations, logical operations, storage operations, jump operations, etc. These instructions can be executed on the stack of the EVM to complete the execution of smart contracts.

The memory and storage simulated by the EVM are devices used to store the state and data of smart contracts. The EVM treats memory and storage as two separate regions and can access the state and data of smart contracts by reading from and writing to memory and storage.

The stack simulated by the EVM is used to store the operands and results of instructions. Most of the instructions in the EVM's instruction set are stack-based, meaning they read operands from the stack and push results back to the stack.

In summary, the EVM provides a virtual computing environment by simulating CPU, memory, storage, and stack, allowing for the execution of smart contract instructions and storage of smart contract state and data. In practice, the EVM loads the bytecode of smart contracts into memory and executes the logic of smart contracts by executing the instruction set. The EVM essentially replaces the operating system and hardware components shown in the diagram.

The design process of the EVM is clearly a bottom-up approach. It first determines the simulated hardware environment (stack, memory) and then designs its own assembly instruction set (Opcode) and bytecode based on the corresponding environment. Although the assembly instruction set is intended for human understanding, it involves low-level knowledge and imposes high requirements on developers, making development complex. Therefore, higher-level languages are needed to shield the intricate low-level calls and provide a better experience for developers. Due to the customized design of its assembly instruction set, it is difficult to directly use traditional high-level languages. As a result, a new high-level language was developed to adapt to this virtual machine. In the Ethereum community, two compiled high-level languages, Solidity and Vyper, were designed for the EVM's execution efficiency. Solidity is well-known, while Vyper was developed by Vitalik to address certain flaws in Solidity but did not gain significant adoption in the community and gradually faded out of the spotlight.

What is zkEVM?

Simply put, zkEVM is an EVM (Ethereum Virtual Machine) that incorporates zero-knowledge proofs/validity proofs technology, allowing the execution process of the EVM to be verified more efficiently and at a lower cost using zero-knowledge proofs/validity proofs, without requiring all validators to re-execute the EVM.

There are numerous zkEVM products in the market, and it is a hotly contested field. The main players include Starknet, zkSync, Scroll, Taiko, Linea, Polygon zkEVM (formerly Polygon Hermez), etc., which have been categorized into five types (1, 2, 2.5, 3, 4) by Vitalik Buterin. For more detailed information, you can refer to Vitalik's blog.

https://vitalik.ca/general/2022/08/04/zkevm.html

Why do we need zkEVM?

This question needs to be considered from two perspectives.

Initially, zk Rollup attempts could only achieve relatively simple transfer and transaction functionalities, such as zkSync Lite, Loopring, etc. However, once users become accustomed to the Turing-complete EVM on Ethereum, and when they are unable to create diverse applications through programming, they start calling for a virtual machine on Layer 2 to write smart contracts. This is the first requirement.

Due to certain designs in the EVM that are not conducive to generating zero-knowledge proofs/validity proofs, some projects chose to use instruction sets that are more friendly to zero-knowledge proofs/validity proofs at the underlying level, such as Starknet's Cairo Assembly and zkSync's Zinc Instruction. However, everyone is also reluctant to give up the extensive user ecosystem of the EVM, so they choose to remain compatible with the EVM at the higher layer, which corresponds to types 3 and 4 zkEVMs. Some projects still adhere to the traditional instruction set (Opcode) of the EVM and focus on generating more efficient proofs for Opcodes, which corresponds to types 1 and 2 zkEVMs. The vast ecosystem of the EVM is the second reason.

Kakarot: A virtual machine on a virtual machine?

Why can we have a virtual machine on top of another virtual machine? This is a common occurrence for computer professionals, but it may not be so obvious for users who are not familiar with computers. It is actually quite easy to understand. It's like building with building blocks: as long as the lower level is solid enough (having a Turing-complete execution environment), you can stack blocks infinitely. However, no matter how many layers you stack, the final execution still needs to be handed over to the underlying physical hardware, so the increase in layers can lead to a decrease in efficiency. Additionally, as different building blocks (virtual machine designs) have different designs, the higher you stack, the greater the possibility of the blocks collapsing (runtime errors), requiring a higher level of technical expertise.

Kakarot is an EVM implemented on Starknet using the Cairo language, simulating the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. In relative terms, implementing the EVM is not a difficult task. Apart from the most widely used EVM implemented in Go in Go-Ethereum, there are also EVM implementations in Python, Java, JavaScript, Rust, and other languages.

The technical challenges of the kakarot zkEVM lie in the fact that the protocol exists as a contract on Starknet, which raises two key issues:

Compatibility: Starknet uses a completely different account system from Ethereum. In Ethereum, accounts are divided into EOA (Externally Owned Accounts) and CA (Contract Accounts), but Starknet supports native account abstractions, where all accounts are contract accounts. Additionally, due to the use of different cryptographic algorithms, users cannot generate the same addresses in both Starknet and Ethereum using the same entropy.
Cost: Since the kakarot zkEVM exists as a contract on the chain, there are high requirements for code optimization, with a focus on optimizing gas usage and reducing interaction costs.
Stability: Unlike traditional high-level languages such as Golang, Rust, and Python, Cairo is still in the experimental stage. From Cairo 0 to Cairo 1, and now Cairo 2 (or, if you prefer, Cairo 1 version 2), the language features are still being modified by the official team. At the same time, the Cairo VM has not undergone sufficient testing, and there is a possibility of large-scale rewrites in the future.

The kakarot protocol consists of five main components (the GitHub documentation mentions four components, excluding EOA, but this adjustment is made in this text to facilitate reader understanding):

Kakarot (Core): Responsible for executing Ethereum-style transactions and providing corresponding Starknet accounts for Ethereum users.
Contract Accounts: Corresponding to Ethereum's CA, responsible for storing the bytecode of contracts and the variable states within contracts.
Externally Owned Accounts (EOA): Corresponding to Ethereum's EOA, responsible for forwarding Ethereum transactions to Kakarot Core.
Account Registry: Stores the mapping between Ethereum accounts and Starknet accounts.
Blockhash Registry: Blockhash is a special opcode that requires past block data. However, Kakarot cannot directly obtain this data on the chain. This component stores the mapping of block_number -> block_hash, which is written by the administrator and provided to Kakarot Core.

According to Elias Tazartes, CEO of Kakarot, the team has abandoned the design of "Account Register" in the latest version and instead opted to use a mapping of a 31-byte Starknet address to a 20-byte EVM address to store the corresponding relationship directly. In the future, to improve interoperability and allow Starknet contracts to register their own EVM addresses, the team may reconsider using the design of "Account Register".

Compatibility with EVM on Starknet: What are the differences between Warp and kakarot?

In terms of the zkEVM types defined by Vitalik, Warp belongs to Type-4, while kakarot currently falls under Type-2.5.

Warp is a translator that converts Solidity code into Cairo code. It is called a translator rather than a compiler because the output Cairo code is still in a high-level language. Through Warp, Solidity developers can maintain their existing development workflow without having to learn a new language like Cairo. For many projects, Warp lowers the barrier to entry into the Starknet ecosystem as they don't need to rewrite a significant amount of code using Cairo.

While the idea of translation is simple, the compatibility is the lowest. Some Solidity code may not translate well into Cairo, and modifications to the source code are needed to handle aspects like account systems and cryptographic algorithms. The unsupported feautures are listed on Warp doc. For example, many projects differentiate the execution logic between externally owned accounts (EOAs) and contract accounts, but in Starknet, all accounts are contract accounts. This part of the code would need modification before translation can be done.

Warp provides compatibility at the high-level language level, while kakarot provides compatibility at the EVM level.

Kakarot's complete rewrite, implementing opcodes and precompiles one by one, gives it a higher level of native compatibility. After all, executing code in the same virtual machine (EVM) will always be more compatible than executing it in different virtual machines (Cairo VM). The Account Registry and Blockhash Registry cleverly shield the differences between different systems, minimizing the friction in user migration.

Kakarot Team

Thanks to the kakarot team for their valuable comments on this article, especially Elias Tazartes. Thank you, sir!