Mike Alliegro

Posted on Apr 05, 2022Read on Mirror.xyz

Scaling Layer 1: ZK-rollups and Sharding

Overview: Below is an overview of the problem with scaling L1s and how the combination of ZK-rollups and sharding will lead to massive improvements in scaling.

Fundamental Problem with Scaling Layer 1

Blockchains are responsible for three primary tasks: execution, security (or consensus), and data availability.

  • Execution — Processing transactions, which involves computational work (given State N and State transition -> State N+1)
  • Security (or Consensus) — Defining how a block is added to the chain and how participants agree that a block is the correct one
  • Data Availability — Data is stored on each blockchain node. This data should be available to anyone and everyone who uses the blockchain

Monolithic blockchains that seek to perform all three of these tasks are inherently restrained by the blockchain trilemma for the following reasons. All nodes must store the entire state of the ledger or (in the case of light nodes) have a mechanism by which they can be reasonably assured that the validator who published the block also published the entirety of the blockchain data. This requirement is necessary because validators and full nodes must be able to recompute the transactions in a proposed block against the previous stored state in order to ensure that block producers followed the rules of the blockchain (e.g. did not include invalid transactions). Validators must have sufficient disc space (data storage), processing speed (CPU), and memory (RAM) to fulfill these obligations.

With these hardware requirements in mind, throughput can be increased at the base layer primarily through two levers:

  • Increase block production speed: In the case of Ethereum, block production speed can be increased by increasing the number of slots per epoch (which may require a faster consensus mechanism to ensure liveness)
  • Increase block size: Block size can be increased by raising the block gas limit (currently 30 million)

Both adjustments, however, increase hardware node requirements. If blocks are larger or produced faster, nodes must increase their processing speed and memory to “keep up” with the chain. As a second order effect, nodes must increase disc space or the network will reach state bloat faster (caveat: regardless of block size and speed, state bloat will be reached eventually without new implementations such as sharding or state expiry). Increasing node hardware requirements reduces decentralization, by crowding out slower computers. To highlight this issue, let’s look at Solana briefly:

  • Validator hardware requirements: 2.8GHz 12 core CPU, 256GB RAM, 1TB of storage
  • Number of validators: 1,500
  • Stated TPS: 65k

In June 2021, using a computer with 3.2 GHz 6 core CPU, 32GB RAM, 1TB Storage, one of co-founders of Cypherpunk tested full validation sync performance for various blockchains. Using trusted snapshots (i.e. not even downloading and validating the entire transaction history) to sync with Solana, his computer crashed after 54 minutes. In fact, with 300KB blocks arriving every 0.4 seconds, a relatively powerful computer could not even catch up with the chain regardless of storage capacity. See more node sync tests here: https://blog.lopp.net/2021-altcoin-node-sync-tests/

Clearly, we have a fundamental problem scaling blockchains at the base layer. Higher hardware requirements reduces validator and full node participation in the network, which means there are fewer parties that can check if block producers followed the rules of the blockchain. While it is difficult to estimate the level of decentralization that is sufficient for security purposes, there exist other routes (e.g. rollups) to scalability that allow blockchains to optimize *only *for decentralization and security at the base layer.

There have been methods proposed in the past — such as Plasma — to bring execution off-chain and circumvent the hardware requirement limitations of an L1. These methods, however, require data availability to be maintained off-chain (i.e. participants in a Plasma chain must store the entire transaction history), which has security implications.

ZK-Rollups: Bringing Execution Off-Chain

ZK-Rollup Landscape

ZK-rollups bring execution off-chain without the need to store the entire L1 ledger. ZK-rollups have two primary features that enable L1 scaling: data compression and validity proofs.

ZK-rollups leverage a smart contract on-chain to maintain a state root of the rollup (essentially, a cryptographically compressed version of the rollup ledger). When the rollup receives transactions, rollup validators (or relayers) publish compressed batches of transactions along with the previous state root and the new state root to the rollup contract on-chain. This process is essentially the same as execution on the main chain (State N + State transition -> State N+1), but with the addition of data compression. By doing so, the computational work needed to perform a state transition is moved off-chain, removing part of the execution burden from the main-chain.

While the smart contract on the main-chain ensures that the previous state root of the rollup is correct, there is no way to ensure that the computational work performed by the rollup to transition to the new state root was done correctly, without validity proofs. In addition to the compressed transaction data, previous state root, and new state root, validators post cryptographic proofs called ZK-SNARKs that prove to the smart contract on-chain that the new state root is the correct result of executing the batch.

Since a ZK-rollup provides this cryptographic proof up-front (as opposed to Optimistic Rollups), these rollups can completely remove certain transaction data — e.g. nonces — or compress certain transaction data, reducing the data burden of the main-chain. For example, gas prices of each transaction can be compressed into a batch gas price and signatures can almost be removed entirely, because the ZK-SNARK already proves that signature data was verified (e.g. that signatures matched senders and receivers). It is estimated that the data of simple Ethereum transaction can be compressed from ~110 bytes to ~12 bytes using a rollup.

Due to this data compression, the fact that on-chain verification of proofs does not scale with batch size, and the trust-less assumptions of a ZK-rollup, rollup validators can maximize for throughput without security risks to the main-chain. However, roll-up validators are penalized for posting invalid validity proofs and can be slashed, so a decentralized validator set is still preferred to ensure liveness.

Sharding: Scaling Data Availability

ZK-rollups remove the execution restraints implicit in the blockchain trilemma and make significant headway in reducing main-chain data storage requirements. However, using a Geth execution client, full node sync size is still well over 1TB (OpenEthereum is about 575GB) and growing, so there remains a need to scale L1 data layers. Despite this restraint, rollups without improvements to data layers can still increase Ethereum TPS from 10–15 to 2–3k (although state bloat would be reached more quickly). Ethereum’s sharding rollout, targeted for 2023, will allow the network to handle an estimated 100k TPS, which is likely necessary for mass adoption.

Ethereum’s sharding upgrade will split storage of the blockchain ledger into 64 shards, which all “report” to the beacon chain. The beacon chain will be responsible for consensus and validator assignment but will not store state — in fact, each shard will have its own state (i.e. user accounts will be specific to a shard) and transaction history. Validators will be randomly assigned and periodically reassigned to shards to verify shard blocks through voting. The beacon chain will verify that shard blocks have acquired the requisite number of votes before providing finality. There is significant more complexity to sharding, specifically related to cross-shard communication, that is beyond the scope of this article. Regardless, one can see that sharding has significantly reduced the data storage requirements on validators, who now only need to hold their assigned shard’s state.

However, simply dividing the data storage requirements by 64 (caveat: this is not exactly how it works) is still not sufficient in the long-run to handle many ZK-rollups posting hundreds of thousands of (even) compressed transaction data to the main-chain. Crucially though, nodes do not need to actually store this transaction data, they just need to be reasonably assured that the transaction data was made available by the rollup (e.g. so that users have access to their balances). Therefore, nodes in a sharded network will leverage data availability sampling to obtain these reasonable assurances without having to download all the data. Before posting to the main-chain, ZK-rollups will duplicate the compressed transaction data using a tool called erasure coding. By duplicating the data, nodes on the main-chain only need to sample the data to be reasonably assured of its availability. If the sampled data is unavailable, nodes can reject the block proposed by the rollup. This combination of sharding and data availability sampling will exponentially reduce the data storage requirements of the base layer.

Wrap-up

While ZK-rollups take execution off-chain and may 100x the transaction processing capacity of the Ethereum network, we still run into problems with data storage and availability. Sharding will expand the data layer, but only when combined with data availability sampling will the network be able to fully utilize the execution capacity of rollups.

Citations

  1. https://docs.solana.com/running-validator/validator-reqs
  2. https://blog.lopp.net/2021-altcoin-node-sync-tests/
  3. https://vitalik.ca/general/2021/01/05/rollup.html
  4. https://etherscan.io/chartsync/chaindefault
  5. https://vitalik.ca/general/2021/04/07/sharding.html
  6. Image Credit: Finematics