tl;dr

For light-client data availability, there is little disagreement that erasing codes are used to solve the problem, the difference being how to ensure that the erasing codes are properly encoded. KZG commitments are used in Polygon Avail and Danksharding, while fraud proof is used in Celestia.

For Rollup’s data availability, if DAC is understood as a consortium blockchain, what Polygon Avail and Celestia have done is to make the data availability layer more decentralized — providing a “DA-Specific” Layer1, thereby increasing the level of trust.

We believe that in the next 3 to 5 years, the blockchain architecture will inevitably evolve from monolithic to modular, with each layer showing low coupling. Rollup-as-a-Service (RaaS), Data Availability-as-a-Service (DAaaS), and many other modular components may emerge in the future to make the composability of blockchain architecture ‘LEGO’ possible. Modular blockchains are one of the key narratives underpinning the next cycle.

Among them, the behemoths of the execution layer (i.e. Rollup) have already divided the cake with few latecomers; the consensus layer (i.e. each Layer1) is quite crowded. Since those upcoming Layer1 such as Aptos and Sui started to emerge, the Layer1 competition landscape has not yet settled, but its narrative is like old wine in new bottles, making it difficult to find reasonable investment opportunities.

And the value of the data availability layer is still to be explored.

Modular Blockchain

Before we talk about data availability, let’s take a moment for a brief review of modular blockchain.

Source: IOSG Ventures, adapted from tweets of Peter Watts

There is no strict definition of modular blockchain layering, with some layering approaches starting from Ethereum and others leaning towards a generalized perspective, depending mainly on the context in which they are discussed.

Execution layer: Two things happen at the execution layer. For a single transaction, the transaction is executed and a state change occurs; for a transaction batch, the state root of the batch is calculated. Part of the work of the execution layer of Ethereum is distributed to Rollup, which we know as StarkNet, zkSync, Arbitrum, and Optimism as of today.
Settlement layer: It’s the process of verifying the validity of state roots (zkRollup) or fraud proofs (Optimistic Rollup) for Rollup contracts on Ethereum Layer1.
Consensus layer: Whether PoW, PoS, or other consensus algorithms are used, in short, the consensus layer is to agree on something in a distributed system, i.e., to reach consensus on the validity of state transitions. In the context of modularity, the purpose of the settlement layer and consensus layer are somewhat similar, so some researchers have also unified the settlement layer and consensus layer.
Historical state layer: proposed by Polynya (for Ethereum only). After introducing Proto-Danksharding, Ethereum only maintains on-chain data availability for a certain time window, after which it prunes and leaves this job to others. For example, Portal Network or other third parties that store this data can be classified in this layer.
Data availability layer: What are the problems with data availability? What are the solutions to each? This is the question that this article will focus on, so I will not generalize about it here.

Source: IOSG Ventures

Back in 2018 and 2019, data availability was more in the context of light client nodes; while in the later Rollup perspective data availability has another spectrum of meaning. In this article, we will explain data availability in two different contexts of “Nodes” and “Rollup” respectively.

DA in Nodes

Source：https://medium.com/metamask/metamask-labs-presents-mustekala-the-light-client-that-seeds-data-full-nodes-vs-light-clients-3bc785307ef5

Let’s first look at the concept of full nodes and light clients.

Since full nodes download and verify each transaction in each block themselves, they do not require honest assumptions to ensure that the state is executed correctly, which gets good security guarantees. However, running a full node requires resources for storage, computing power, and bandwidth, and there is no incentive for ordinary users or applications other than miners to run a full node. Moreover, if a node just needs to verify some information on the blockchain, running a full node is non-essential.

This is what light clients are doing. Light clients are a term that distinguishes them from full nodes in that they often do not interact directly with the blockchain, but rely on neighboring full nodes as intermediaries to request needed information, such as downloading block headers, or verifying account balances.

A light client as a node can quickly synchronize the entire blockchain, as it only downloads and verifies block headers; and in the cross-chain bridge model, the light client acts as a smart contract — the light client of the destination chain only needs to verify that the tokens of the source chain are locked, rather than verifying all transactions of the source chain.

What’s the problem?

However, there is an implicit problem: since light clients only download block headers from full nodes, rather than downloading and verifying each transaction themselves, then malicious full nodes (block producers) can construct a block containing invalid transactions and send it to light clients to fool them.

It is tempting to think of a “fraud proof” solution to this problem: it only needs one honest full node which monitors the validity of the block, constructs a fraud proof, and sends it to light clients to warn them if an invalid block is found. Or, after receiving the block, the light client takes the initiative to ask the whole network whether there is fraud proof. If it does not receive that after a while, it can assume that the block is valid. In this way, light clients can achieve almost the same level of security as full nodes (but still rely on honest assumptions).

However, in the above discussion, we assume that the block producer will always publish all the block data, which is the basic premise for generating fraud proof. However, a malicious block producer may hide some of the data from the block when it is published. At this point, the full node can download the block and verify that it is invalid. But the nature of the light client prevents them from doing so. Full nodes are also unable to generate fraud proof to warn light clients due to lack of da ata.

Moreover, some data may be uploaded at a later time due to network congestion, and we cannot even tell if the missing data is due to an objective condition or an intentional act by the block producer — then the reward and punishment mechanism for fraud proof will not work.

That’s what we’re talking about in terms of data availability in nodes.

Source：https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding

There are two cases in the figure above: First, a malicious block producer posts a block with missing data, at which point the honest full node warns, but then the producer republishes the rest of the data; Second, an honest block producer publishes a complete block, but then a malicious full node issues a false warning. In both cases, the block data that others in the network see after T3 is complete, but someone is doing evil in it.

In this sense, the use of fraud proof to ensure data availability for light clients is flawed.

The Solution

In September 2018, Mustafa Ai-Bassam (now the CEO of Celestia) and Vitalik co-authored a paper that proposed the use of multidimensional erasing code to check data availability. The light client simply downloads a random portion of the data and verifies it to ensure that all data blocks are available, and reconstructs all data if necessary.

There is little disagreement about using erasing code to solve data availability problems for light clients, with Reed-Solomon erasing codes used in Polygon Avail, Celestia and Danksharding for Ethereum.

The difference is how to ensure that the erasing code is correctly coded: KZG commitments are used in Polygon Avail and Danksharding, while fraud proof is used in Celestia. Both of them have advantages and disadvantages. KZG commitments are not quantum-resistant, while fraud proof depends on certain honest assumptions and synchronization assumptions.

In addition to the KZG commitments, STARK and FRI (proposd by Vitalik) can also be used to prove the correctness of the erasure code.

DA in Rollup

Data availability in Rollup is: In zkRollup, everyone can rebuild the states in Layer2 on their own to ensure censorship resistance; In Optimistic Rollup, it should make sure that all data from Layer2 is published, which is a prerequisite for building a fraud proof. So what’s the problem?

When we look at Layer2’s fee structure, in addition to the fixed cost, the variables related to the number of transactions per batch are Layer2’s gas cost and the cost of on-chain data availability. The former has little impact; The latter requires a constant payment of 16 gas per byte, accounting for as much as 80%-95% of Rollup’s cost.

(On-chain) Data availability is expensive. How to deal with this?

One is to reduce the cost of storing data on chain: this is what the protocol layer does. In IOSG’s Insight 「The Merge Is Coming: A Detailed Catch-Up For Ethereum Roadmap」, we have mentioned that Ethereum is considering introducing Proto-Danksharding and Danksharding to provide a larger block for Rollup, that is, larger data availability space, and adopting erasuring code and KZG commitments to mitigate the burden for running a node. But from the perspective of Rollup, it is unrealistic to passively wait for Ethereum to adapt itself.

The other is to put data off-chain. The following figure lists the current off-chain data availability solutions. The generalized solutions include Celestia and Polygon Avail; the user-selectable options in Rollup include StarkEx, zkPorter, and Arbitrum Nova.

Source：IOSG Ventures

(Note: Validium originally refers to the scaling solution combining zkRollup and off-chain data availability. For convenience, Validium used in this article refers to the off-chain data availability solutions, and will be compared with the others.)

Let’s take a look at these options in detail.

DA Provided by Rollup

In the simplest Validium, a centralized data operator is responsible for ensuring data availability, and users need to fully trust the operator. Under this scenario, low cost is the benefit, but practically no security guarantees.

As a result, in 2020 StarkEx further proposed the Validium maintained by Data Availability Council (DAC). The members of the DAC are well-known individuals or organizations within the legal jurisdiction, and the trust hypothesis is that they will not collude and do evil.

Arbitrum proposed AnyTrust this year, which also adopted DAC to ensure data availability, and built Arbitrum Nova based on AnyTrust.

zkPorter, on the other hand, proposed that data availability should be maintained by Guardians (zkSync Token holders), who are required to stake zkSync Token, and if a data availability failure occurs, their funds will be slashed.

All three solutions provide an option called Volition: users can choose on-chain or off-chain data availability as needed, according to specific usage scenarios.

Source: https://blog.polygon.technology/from-rollup-to-validium-with-polygon-avail

General DA Scenarios

The above solutions are proposed based on the idea that since the reputation of ordinary operators is not high enough, a more authoritative committee should be introduced to improve its credibility.

Is a small committee safe enough? The Ethereum community raised the issue of Validium’s ransomware attack two years ago: if enough committee members’ private keys were stolen to make off-chain data availability unavailable, users will be threatened — only when they pay enough ransom can they withdraw their money from Layer2. Given the previous cases of Ronin Bridge and Harmony Horizon Bridge, we cannot ignore such a possibility.

Since the off-chain data availability committee is not sufficiently secure, what if a blockchain is introduced as a trusted party to ensure off-chain data availability?

If we take the DAC as a consortium blockchain, then what Polygon Avail and Celestia do is to make the data availability layer more decentralized — equivalent to providing a “DA-Specific” Layer1 with a series of validators, block producers, and consensus mechanisms to enhance trust.

In addition to the improvement of security, if the data availability layer itself is a blockchain, then it can also be used as a generalized solution, not limited to providing data availability for a specific Rollup or blockchain.

Source：https://blog.celestia.org/celestiums/

We explain this with the example of Quantum Gravity Bridge, which is Celestia’s application on Ethereum Rollup. In the scenario, the L2 Contract on Ethereum Layer1 verifies the validity proof or fraud proof as before. The difference is that the data availability is provided by Celestia. There are no smart contracts and no calculations performed on the Celestia Layer1. Celestia only ensures that the data is available.

L2 Operator publishes the transaction data to Celestia Layer1, and the validators of Celestia sign the Merkle Root of DA Attestation, and send it to DA Bridge Contract on Ethereum Layer1 for verification and storage.

This actually proves all data availability with DA Attestation’s Merkle Root, and the DA Bridge Contract on Ethereum Layer1 only needs to verify and store this Merkle Root, so the cost is greatly reduced.

(Note: There are some other data availability schemes, including Adamantium and EigenLayr. In Adamantium, users can choose to host their off-chain data and sign to confirm their off-chain data availability after each state transition, otherwise, the funds will be automatically sent back to the main chain to ensure security, or users are free to choose their data availability provider. EigenLayr is an academic solution, proposing Coded Merkle Tree and data availability oracle ACeD.)

Wrap it up

Source: IOSG Ventures, adapted from Celestia Blog

After discussing the above solutions one by one, let’s do side-by-side comparison from the perspective of security/decentralization and gas cost. Note that this chart represents the author’s personal understanding and serves as a vague and approximate division rather than a quantitative comparison.

Pure Validium has the lowest security/decentralization and gas cost.

Solutions we mentioned above are put in the middle part. In my opinion, the security/decentralization of zkPorter with validators set is slightly higher than DAC, while the DA-Specific blockchain solution is slightly higher than zkPorter. At the same time, the cost of Gas increases accordingly. Note that this is only a very rough comparison.

We also have the on-chain data availability scenarios here, which have the highest level of security/decentralization and gas cost. All three schemes have equal security/decentralization since their data availability is provided by the Ethereum Layer1. The pure Rollup scheme has lower gas cost compared to monolithic Ethereum, and with Proto-Danksharding and Danksharding, the cost of data availability will be further reduced.

Note: Most of the data availability contexts discussed in this article are under Ethereum. It should be noted that Celestia and Polygon Avail are generalized schemes and are not limited to Ethereum itself.

We conclude with a summary of the above schemes in a table.

Source: IOSG Ventures

Closing Thoughts

After discussing the above data availability issues, we find that all the solutions are essentially trade-offs under the mutual constraints of the scalability trilemma, and the difference between the solutions lies in the “granularity” of the trade-offs.

From the user’s perspective, it makes sense for the protocol to offer the option of on-chain and off-chain data availability. This is because the user’s sensitivity to security and cost varies among different application scenarios or user groups.

We discussed more about the data availability layer support for Ethereum and Rollup. In terms of cross-chain context, Polkadot’s relay chain provides native security for data availability for other parallel chains; while Cosmos IBC relies on a light client model, so it is critical to ensure that light clients can verify data availability for both source and destination chains.

The benefit of modularity is the pluggability and flexibility to adapt protocols on-demand: for example, to unburden Ethereum of data availability while ensuring security and trust; or to increase the security level of the light client model and reduce trust assumptions in a multi-chain ecosystem. Not only limited to Ethereum, data availability can be useful in multi-chain ecosystems and even more application scenarios in the future.

We believe that in the next 3 to 5 years, the blockchain architecture will inevitably evolve from monolithic to modular, with each layer showing low coupling. Rollup-as-a-Service (RaaS), Data Availability-as-a-Service (DAaaS), and many other modular components may emerge in the future to realize the composability of blockchain architecture ‘LEGO’. Modular blockchains are one of the key narratives underpinning the next cycle.

And the value of the data availability layer is still to be explored.