How the Sovereign SDK works - Sovereign

In our announcement blog post, we sketched out a vision of a future in which creating a zk-rollup is as easy as launching a dapp. This blog post seeks to explain how we’ll realize that vision. Before we dive in, a quick warning - this post will be somewhat technical but not completely rigorous. If you want a non-technical introduction to the Sovereign SDK, see here instead. If you want full technical rigor, you can find draft specs and implementation of the Sovereign SDK on GitHub or reach out on Discord. With that out of the way, let’s get started!

Why ZK-Rollups?

In this section, we’ll try to explain why we think zk-rollups are the “endgame” for blockchains, from first principles.

The Price of Atomic Composability

We start from the thesis that Ethereum’s killer feature isn’t the EVM itself, it’s atomic composability. Any smart-contract can interact with any other one, and the system guarantees that the caller won’t get stuck in some intermediate state - even if something goes wrong further up the call stack. This drastically simplifies the programming model, making Ethereum one of the friendliest environments for developers. But atomic composability comes at a price. In order to execute a call atomically, full nodes need to access the relevant state of all contracts in the call stack in real time. That means that they have to have all of the relevant state stored locally, and they need to be able to load it very rapidly. In other words, full nodes need to buy SSDs, and they need to buy enough of them to store the entire chain state.

To maximize scalability, systems like Near attempt to relax these requirements. Instead of atomic composability, Near only allows asynchronous cross-contract calls. This setup allows much more scalability, since each contract’s state can be stored only on a dedicated subset of full nodes. But it comes at great cost to the programmer - who has to reason through a combinatorial explosion of code paths.

Best of Both Worlds

Our vision is the synthesis of these two extremes. We believe that no fully atomic system can scale to the needs of the entire world. But at the same time, we recognize the central role that atomicity plays in developing simple and composable smart contracts. And so, we seek to achieve the best of both worlds: an ecosystem of optimized chains that each provide atomic composability, connected by fast (but asynchronous) cross-chain bridges across the ecosystem.

Within a single chain, developers retain atomic composability. Contract designs can be relatively simple, and users can know the outcomes of their transactions in near-real time. But at the same time, no set of full nodes is responsible to store and update the state of the entire world.

Build rollups, not Blockchains

We certainly aren’t the first people to think along these lines. As far as we know, the first attempt to create an integrated multi-chain system was Cosmos. Much like our vision, Cosmos is a bottom-up ecosystem of heterogeneous chains with fast and reliable bridging. Where Cosmos falls short is in the ability to create new chains. Before a chain becomes useable, it needs a “validator set” to throw meaningful economic weight behind its operation. Costs of capital being what they are, it’s extremely difficult to attract a validator set to a chain until the chain has enough usage to create significant revenues. But of course, no user wants to risk their money on an insecure chain. This leads to a chicken-and-egg scenario, and makes launching new chains a herculean undertaking.

In addition, Cosmos chains face fundamental difficulties in implementing bridges. Their light client bridges (IBC) are expensive to maintain because headers have to be processed on-chain. This results in a sparse bridge graph, which translates into a bad user experience. Even worse, funds can be lost if a chain suffers a long-distance reorg or its validator set executes an invalid state transition.

Our solution to these problems is simple: share a validator set with many other chains. In other words, don’t build blockchains; build rollups.

This, at last, leads us to our offering: the Sovereign SDK.

What is the Sovereign SDK?

The Sovereign SDK is a toolkit for building asynchronously-proven, sovereign zk-rollups. That’s quite a mouthful, so let’s go through it piece by piece.

First, what does it mean that transactions are proven asynchronously? In plain English, it simply means that raw transaction data is posted to the L1 by a sequencer in real-time, and proofs are generated later. This is in contrast to rollups like StarkNet, where proofs are generated off-chain before any data is posted on chain. The advantage of asynchronous proving is that it allows transactions to be finalized in real-time, giving the rollup responsiveness like that of an Optimistic Rollup.

Second, what does it mean for a rollup to be sovereign? This is easiest to understand by contrast with a “smart-contract” rollup. In a smart-contract rollup, the L2 state is only final (in the view of light clients) when it is accepted by a smart-contract on the L1. Because today’s L1 chains have such limited throughput, smart-contract rollups are forced to update their canonical on-chain state relatively infrequently. On the other hand, light clients of sovereign rollups are responsible to decide for themselves if a particular block should be accepted. This allows sovereign rollups to finalize much more quickly, since they don’t have to throttle their update frequency to accommodate congested L1s.

How Does it Work?

The Sovereign SDK works by abstracting the functionality of a zk-rollup into a hierarchy of interfaces. At each level of the hierarchy, developers are free to use one of the pre-packaged implementations provided by the SDK or build their own functionality from scratch. By encapsulating logic behind well-defined interfaces, we’re able to provide pluggable components that create a simple developer experience without sacrificing flexibility.

The Core APIs

At the highest level of abstraction, each Sovereign SDK chain combines three distinct elements:

An L1 blockchain, which provides DA and consensus
A state transition function (written in Rust), which implements the "business logic" of the chain
A zero-knowledge proof system capable of (1) recursion and (2) running a subset of Rust

interface DaLayer {
  // Gets all transactions from a particular da layer block that are relevant to the rollup
  // Used only by full-nodes of the rollup
  function get_relevant_txs(header: DaHeader) -> Array<DaTxWithSender>
  // Gets all transactions from a particular da layer block that are relevant to the rollup,
  // along with a merkle proof (or similar) showing that each transaction really was included in the da layer block.
  // Depending on the DA layer, may need to include auxiliary information to show that no 
  // relevant transactions were omitted.
  // Used by provers
  function get_relevant_txs_with_proof(header: DaHeader) -> (Array<DaTxWithSender>, DaMultiProof, CompletenessProof>
  // Verifies that a list of Da layer transactions provided by an untrusted prover is both 
  // complete and correct.
  // Used by the "verifier" circuit in the zkVM
  function verify_relevant_tx_list(txs: Array<DaTxWithSender>, header: DaHeader, witness: DaMultiProof, completenessproof: CompletenessProof)
}

// The interface to a state transition function, inspired by Tendermint's ABCI
interface StateTransitionFunction {
	// Called once at rollup Genesis to set up the chain
  function init_chain(config: Config) 
  // A slot is a DA layer block, and may contain 0 or more rollup blocks
  function begin_slot(slotnumber: u64) 
  // Applies a batch of transactions to the current rollup state, returning a list of the
  // events from each transaction, or slashing the sequencer if the batch was malformed
  function apply_batch(batch: Array<Transaction>, sequencer: Bytes): Array<Array<Event>> | ConsensusSetUpdate
  // Process a zero-knowledge proof, rewarding (or punishing) the prover
  function apply_proof(proof: RollupProof): Array<ConsensusSetUpdate>
  // Commit changes after processing all rollup messages
  function end_slot(): StateRoot
}

interface ZkVM { 
  // Runs some code, creating a proof of correct execution
  function run(f: Function): Proof
  // Verifies a proof, returning its public outputs on success
  function verify(p: Proof): (Result<Array<byte>>)
}

Generalized Full Node

Using the core interfaces we just described, the Sovereign SDK will provide a generalized full-node implementation capable of running almost any state-transition function over any data availability layer. In general terms, the full node works by encapsulating rollup functionality behind the core interfaces we just described, and treating rollup data as opaque byte strings that can be stored and transmitted. To give you a sense of how the full node will operate, here’s a rough illustration of what the core event loop will look like.

// A pseudocode illustration of block execution. Plays fast and loose
// with types for the sake of brevity.
function run_next_da_block(self, prev_proof: Proof, db: Database) {
  // Each proof commits to the previous state as a public output. Use that state 
  // to initialize the state transition processor
  let prev_state = deserialize(prev_proof.verify());
  let current_da_header = db.get_da_header(prev_state.slot_number + 1);
  let processor = stf::new(db, current_da_header, prev_state);
  
  // Fetch the relevant transactions from the DA layer
  let (da_txs, inclusion_proof, completeness_proof) = da::get_relevant_txs_with_proof(current_header);
  da::verify_relevant_tx_list(da_txs, inclusion_proof, completeness_proof)
  
  // Process the relevant DA transactions 
  processor.begin_slot(prev_state.slot_number + 1)
  for da_tx in da_txs { 
    let msg = parse_msg(da_tx);
    if msg.is_batch() { 
      processor.apply_batch(msg)
    } else {
      processor.apply_proof(msg)
    }
  }
  processor.end_slot();
}

Pluggable Components

Because the full node implementation is completely general, developers can easily swap out modules to change the characteristics of their chain. For example, a developer who was very concerned about the cost of calldata would probably use Jupiter, our integration with the Celestia blockchain. On the other hand, a developer who cared primarily about access to liquidity might build on top of Ethereum or EigenDA (we’ll have a pluggable module for this too, but it won’t be developed until after the prototype stage).

Similarly, developers are free to choose their own zkVM - as long as it supports no_std Rust and a few shim APIs. Developers who prefer to use a standard compiler toolchain and VM interface will likely pick Risc0, while those who value small proof sizes might pick another VM like that of the =nil; foundation.

Default Modules

Jumping down a layer of abstraction, the Sovereign SDK also aims to simplify the process of creating state transition functions. Toward this end, we’ll provide a composable module system loosely inspired by the Cosmos SDK. Although we’ll continue building out functionality over time, we plan to launch with at least the following modules: storage, bridging, (fungible) tokens, and proving/sequencing.

As of this writing, the design of the module system is still in its early stages - so everything you read in the section is subject to rapid change. But, we still think it’s useful to sketch out some of the designs we’ve been working on just to give you a taste of what’s to come.

The Bridging Module

First up, bridging. Unlike bridging between separate chains (which is known to be impossible without strong trust assumptions), bridging between rollups on a shared DA layer is not fundamentally difficult. In the Sovereign SDK, we plan to take full advantage of this fact to provide low-latency trust minimized bridges by default.

It works like this: assume you had 50 rollups running on a shared DA layer. To run trust-minimized bridges between each of them in the standard paradigm, you would need every rollup to run an on-chain light client of every other rollup. Doing the math, that’s 50 * 49 = 2450 on-chain light clients. Since this would be prohibitively expensive to actually execute, you’d fall back to using multi-hop bridges instead. You’d have each chain connects to (say) 3 other chains, and - assuming your networks had a reasonable topology - you’d be able to route messages between any pair of chains using 3 or 4 hops.

With the power of zk, we can do much better. Recall that any two zero-knowledge proofs (in the VM model) can be aggregated into a third proof, and that this new proof is no more expensive to verify than the originals were. Leveraging this insight, we can recursively aggregate all 50 rollup proofs off-chain, and then verify that aggregate proof on each of the rollups. So instead of maintaining 2450 light client connections, we can do one off-chain proof aggregation and 50 on-chain proof verifications. In other words, we’ve reduced the communication complexity of all-to-all bridging from O(n^2) to O(n).

The Storage Module

A key component of efficient zk-rollups is state management. When designing an authenticated data store, developers have to juggle several competing concerns. On the one hand, they want something that’s efficient to represent in-circuit, since every read and write to state has to be authenticated inside the zero-knowledge proof. On the other hand, they want something that’s fast and lightweight during native execution so that it doesn’t bottleneck full-nodes and sequencers.

In the Sovereign SDK, we plan to provide a default storage module which balances these two design goals. For efficiency outside of the zkVM, we’ll use the Jellyfish Merkle Tree (JMT) originally developed by Diem, to store authenticated state data. For rapid access, we’ll also maintain a flat on-disk data structure to hold raw state. For efficiency in-circuit, we’ll make our JMT generic over hash function so that developers can easily plug in a hash tailored to their choice of zkVM.

Putting it All Together: the Block Lifecycle

Adding a block to a Sovereign chain happens in three steps. First, a sequencer posts a new blob of transaction data onto the L1 chain. As soon as the blob is finalized on the L1, a new logical rollup state is finalized as well.

As each L1 block is finalized, full nodes of the rollup scan through it and identify the blobs that are relevant to the rollup’s state transition function. Then, they apply the rollup’s STF to each of these blobs in order, computing the new rollup state root. At this point, the block is subjectively finalized from the perspective of the full nodes.

Next, prover nodes (full nodes running inside of a zkVM) perform the same process as full nodes - scanning through the DA block and processing all of the data blobs in order. Unlike full nodes, however, provers race against one another to generate proofs and post them on chain. The first eligible node to create a valid proof is rewarded with tokens corresponding to (some portion of) the rollup’s gas fees. Once a proof for a given batch has been posted on-chain, the batch is subjectively final to light clients.

Finally, a relayer may choose to submit a recent rollup proof to a bridge smart-contract on another chain - either the underlying DA layer or another rollup. For efficiency, relayers will probably use proof aggregation to combine validity proofs of many rollups into a single proof before submitting updates.

Conclusion

If you’ve made it this far, you should now have a good high-level understanding of the Sovereign SDK. If you’re interested in learning even more, dive into our github (contributions welcome!), reach out on Twitter, or come hang on Discord! Also, we’re hiring!