The Data Availability of Celestia - atom_crypto

Overview

Author introduction: W3 Hitchhiker, core contributor for CFG Labs. An independent crypto research team that aims to discover cutting-edge tech and innovative projects by first principles thinking and on-chain data support. We previously focused on the secondary market and used a subjective, non-hedging strategy. We gained lots of experience from Defi and now focus more on the primary market. We built the investment thesis through communicating directly with top founders in the space and utilizing the on-chain data.

We have a team consisting of over 50+ professionals of PhDs, CFA etc in the space. We have three departments, including the 1) Tech and Product department, 2) Onchain- analysis department, and 3) Investment & research department. We are more interested in the infrastructure layer such as DA, Rollups, ZK and other technological innovations for the next cycle. We began to talk to Celestia and have contributed to the community ever since. We spent half a month translating the 200+ page PhD dissertation by Mustafa ( founder of Celestia ), which also incorporated the white paper of Lazy Ledger (predecessor of Celestia). We also meet other contributors including Chloe & Frank from CFG Labs. We are invited by them to share our experiences with Celesita so far and the speakers today are Rex, Hongyi Ren and Bicheng Liu.

Data Availability

The topic for DA is gaining popularity now. We are seeing the performance bottleneck of Ethereum shown not only by the confirmation time (a dozen seconds), high transaction fees, and others etc. So improving the performance of the blockchain becomes the common goal in space. The research areas include:

L2/L3, which uses Rollups for execution and computation. The parallelism of Rollups (execution layer) should improve the overall efficiency;
The scalability of the chain can be solved by increasing the block size. It is recognised as the most effective way to deal with state bloat problems. In order to better utilize the network , Vitalik also mentioned the ideas in the endgame. However, the improved performance works as a double-edged sword and will bring additional burden to the validators. In that case, full nodes/ block producers will be heavily resource required, and tend to be more centralized. Light nodes/ clients will verify in a more decentralized way.

Data availability fits these requirements very well . Celestia is the trailblazer in space. Ethereum is also promoting its EIP-4844 after the Merge, which is ProtoDanksharding, before danksharding. The plan also emphasizes the importance of data availability.

So what’s the DA？The official explanation is to ensure the availability of data through a data sampling process. During the DA process, light nodes do not need to store all data or maintain the state of the entire network in a timely manner. Instead, the data availability and accuracy is required through the efficient way.

Next, we will introduce the difference between DA and consensus in terms of data security. As the core of blockchain lies in the immutability of data, the open, permissionless ledger ensures the data is consistent for the network. In order to ensure performance, consensus nodes will perform in a more centralized way. Other nodes can obtain available data confirmed by consensus through DA. Note that the consensus in Celestia here (consistency of transaction content and transaction order ) is not exactly the same as that of other networks (transaction ordering, verification, etc.).

Celestia

Celestia follows the spirit of Cosmos, which is open, sovereign, and is building the modular blockchain for the DA layer. It uses the Tendermint Consensus but with no execution environment. It has the following characteristics:

Provide data validity for Rollup
Separates the consensus layer and settlement layer. Any rollups built on top of Celestia either require the third-party settlement layer such as Ethereum or dYmension or settle by themselves through consensus.
Solutions for data availability: 2 dimension reed-solomon and fraud proof.
To provide a highly secure service for light nodes. By using fraud proof, light nodes can obtain efficient data verifiable by consensus.

Celestia workflow

Consensus, P2P network, works etc similar to other networks, we will not cover too much here. But we will focus more on the difference. We will divide this part into three parts.

Some differences in block construction. First let’s define what’s a share. Shares include transaction data and the proofs associated with those transactions. Consensus and execution in the Tendermint, Cosmos SDK (staking, governance，Account System) are separated. As Celestia has no execution and settlement layer, the relationship between transaction and state is different in Celestia and Ethereum. In Ethereum, the state for the entire state tree is updated after the transaction is executed. The state for Celestia is not about transaction execution, but the state of the transaction stored on chain. These shares are critical in the sense that they are required in both proofs, and sampling DA are both required. Therefore, Shares can be understood as data for transaction and transaction-related proof, which is constructed as a fixed-length, fixed-format data block.

After the introduction of Shares , let’s talk about the key difference between the block and the blocks of other chains, i.e. its data root. We will introduce the concept of dataRoot according to the whitepaer (subjected to change in actual implementations). This graph is a 2K2K matrix. First, let's see how this matrix came about. The first is the KK matrix, K is the pre-set parameter, which can be modified at any time. Once I prepare for the KK matrix, I will put the previous shares which include transaction-related data in the matrix. In this way I can fill up the the KK matrix. If it is not enough, I will put some invalid data. If enough, I will wait for the next block. The K means the maximum transaction capacity that a single block can handle, ie. the block capacity for Celestia. Shares can include a single transaction. Multiple transactions can be also included in the same shares. Fixed length means that transactions are capped. KK determines the capacity of a single block of Celestia. Once we put the shares within the KK matrix, we are allowed to expand it horizontally through reed solomon, from KK to 2KK matrix. So if KK is the original data, and the extended data is 2KK-original K*K=K’K’. Then we expand the original block size KK vertically, and obtain the K’’K”, and horizontally expand the K”K”. Through this encoding method, we finally obtain the 2K*2K methods. Shares are encoded into data. Then what’s the dataRoot? We see a 2K*2K matrix, and we can build each row and column into a Merkle tree. A Merkle tree will have a Merkle root. We will get 2K+2K=4K Merkle roots. Then construct the 4KMerkle root into a Merkle tree, and finally get the root of the Merkle tree, which is called dataRoot. DataRoot is placed in the status head (Block Head). Celesita's DA works around the dataRoot, and the key data of the block is the Data Root. For example, the teams are working now on solutions such as 1) how do we confirm data and data-related transactions, and 2) how to generate these shares.

Since we already have the data roots, let's see how DA works.

We will not cover the details for the interaction between consensus nodes here, ie. P2P method. DA involves the communication between consensus nodes and light nodes. Nodes that don’t participate in consensus are called light nodes. Propagation of the network mainly depends on the process for DA and how light nodes work. Let's review the details. A new block will be generated after the social consensus for data is confirmed among validators. and hence the block header (dataRoot included) will be distributed to the light node. After receiving the data, light nodes will random sample through the DA proces. They will select a set of coordinates in the 2K*2K matrices and packages into the set. The set is specifically for this sampling and will be sent to the connected validators. Through this way, the light nodes request the connected consensus node send the corresponding shares titled to the set back to them.

There are two types of responses. One is that I have the data: 1) shares you requested, and 2) Merkle proof that shares are included in the data root. Hence I can respond with the above data. Once the light node receives that, it will prove the shares are included by using the merkle proof and only receive it once the proof is confirmed. When the light nodes receive all the responses and confirm it, it recognizes the availability of the block. The response shows the data is already confirmed by consensus nodes which is good. However, if the consensus node did not react/react slowly due to the network problems, light nodes will spread the shares to other consensus nodes, helping to spread the network.

There is a problem, the light node samples a set of coordinates, it does not say how many samples are sampled, its 2K*2K, two-dimensional reed solomon which it means as long as KK is sampled, it can be completely recovered. Why is there no explicit requirement for how many light nodes to sample? If we only have one light node, then we need to sample KK to recover the original data. If there is KK light nodes, only 1 sampling is required for each node given the assumption that there is non replicate sampling for light nodes. The official document also provides the calculation formula, parameters such as how many times you sample, and how many nodes in the network will provide the probability for data availability. Light nodes can also customize the data sampling process according to the security level requirements of the network.

In Celestia, the block size is positive linear with the # of light clients, which will also help to explain why the increase the #validators will increase the overall efficiency for Celestia.

Why do we need fraud proof when we already have the eraser coding technique? For example, if I collect data from 10 validators, it can be determined by erasure code that the data is given by the ten nodes. But how do we know whether the data is correct? Erasure codes can only prove that this data is what they want to give us. But in order to verify the data that the validators give us is correct, fraud proofs should apply.

Fraud proof consist of three parts:

The fraud proof is to challenge the data of which block. Since the fraud proof is optimistic. so it can have a certain lag which means the fraud proof can target the previous blocks
The fraud proof is to challenge the shares where it might be wrong. You will need to point out the shares, as well as the row/column root and merkle proof.
We will need at least additional K shares for rows/columns of wrong shares and merkle proof, in order to recover the whole data and verify.

Let’s understand how the fraud proof interacts with each other?

Once the validators respond with the shares, light nodes will distribute the shares with other consensus nodes (requesting other consensus nodes to help verify). Other consensus nodes will verify whether the shares and local data are consistent, and if they are inconsistent , they will initiate this fraud proof. How to judge the validity of fraud proof? Verification is required as follows:

I have the specified block hash (data root) locally.
I need to verify shares, row/column root as well as merkle proof, ie whether shares are in the data root
I will recover the whole row/column for the shares you sent to me by RS and compare locally.

Then we can confirm the fraud proof is valid. We can blacklist the validators who sent the wrong shares. In summary, DA uses the two dimensional reed solomon and fraud proof. The RS ensures the shares are encoded into data which are further distributed among validators. Through this way, data can be made available to the whole network. For example, rollups can recover its data and compute. Fraud proof is to make sure the data is given correctly. Only by combining two methods can light nodes obtain the trustworthy data in an efficient way.

Comparison with Danksharding

Both Danksharding and Celestia use the abstract 2 dimensional RS technique. But for Danksharding, KZG polynomial commitment is being used. KZG polynomial commitment is being able to provide commitment for the polynomial, saying that for f(x)=y, x, y is the root for the polynomial. The root means the point passes through xy. At the same time, the verifier does not need to know the specific content of the polynomial, nor does it need to execute the polynomial in detail. The transaction proof can be obtained by a simple method, and XY is the fact that XY is a set of solutions of the polynomial. The KZG polynomial commitment is more in line with RS coding, which involves the implementation of RS erasure code.

For RS, we need to expand the K data to 2K, but in what way? If we had K copies of data, you can understand it as the index. If we sequence it, then we can have K points (x, f(x)=y). By using Fourier transform, we can obtain a polynomial of degree K-1. We can graph the polynomial with the original data from 0 to K-1 and we can extend the other K data. By picking any K data we can recover the polynomial, and hence the 2K data, and original K data. The advantages include: 1) are more suitable for secondary encoding. 2) The size turns out to be 48 bytes (fixed). 3) Because of the use of timely proof, the light node can get the proof and verify it and confirm it immediately. This is one advantage of KZG's commitment.

The biggest advantage about Celestia's fraud proof is that I am optimistic. As long as no one behaves maliciously, the efficiency is high for the whole network . Light validators will only be responsible for receiving the data, and recover it according to the predetermined rules.

Besides data availability, PBS is also the one worth mentioning here. PBS addresses the MEV problem by separating the role from proposers and miners, and restricting the rights to censor. Crlist is also an interesting solution. Celestia is currently not working on settlement, but according to Mustafa, the co-founder of Celestia, validators could capture MEV if it's a "sequencer-less" rollup where anyone can propose a block, yes. I'm not sure this necessarily breaks the model of decoupling data and execution, because full nodes still don't need to verify the correctness of transactions, and validators that want to capture MEV could do the execution on a different machine/infrastructure than the one they use for processing data.

To sum up , Celesita's modular blockchain for data availability has no execution and settlement layer, so the capacity of the entire network depends on data availability . Ethereum danksharding is not just data availability, but also the settlement layer.

The recent topics for discussions

The data root is not a 2K*2K root, but a 2K+2K root？ For fraud proof in Celestia , Celestia also has a minimum honest node assumption, which means that when a light node connects to an honest validator , security is guaranteed. In this case, Byzantium which works under ⅔ honest assumption will not work. The latest update from Celestia: previously multiple transactions are included in the share, now the transaction can be included in multiple shares.

The recent third community call also explained the difference in danksharding. We have already introduced the technical part, we now introduce from the user point of view:

Block size. Compared to Ethereum's blob where each block is 16M. Celestia promises larger blocks of 100 megabytes;
Celesita focuses on DA, with less metadata (supplementary data) of blocks data, and execution data. Celestia is not working on the execution layer. Theoretically, Threshold fee is lower than Ethereum.
In terms of sovereignty, Celestia promotes freedom, open, cosmos friendly culture. Unlike smart contact rollups secured by Ethereum where smart contracts verify the data validity, Sovereign Rollups need to secure autonomously.
Celestia uses namespaces to ensure that you do not need to obtain all the data on the main chain, you only need to obtain the data relevant.

Mamaki Testnet

There is currently no incentive. We expect the plan to be launched at the end of the year or the beginning of next year. The next testnet upgrade is expected to happen in October, and targets to serve more developers in the space.The current testnet mainly helps everyone to experience how celestia works

The nodes are running stable during the testnet. Bugs such as crashes are now less. All the validators can work. Light nodes work very well as expected such as the reduced amount of downloaded data and the high execution efficiency. Network requirements may be slightly higher.

The operation of the main chain is not stable. It sometimes takes five or ten minutes to produce a block. In Tendermint, the probability of producing the blocks is determined by the proportion of total stakes. In the case where the amount of stake is similar, it should take turns. But, there are often validators who produce 3-4 blocks continuously right now.

  We are seeing the problems as follows:

The entrance and exit of the validator, no matter how much staked, even very small validators will cause network instability.
Can not connect too many peer nodes, only a small number of nodes, and relatively stable nodes can be connected.
When the transaction volume is not large, the block time is already 50 seconds per bloc. But the good thing is that we are still in the early stage, so there should be a lot of room for throughput improvement in the future
There are serious problems with bridging nodes . We restarted the validators recently and found that both the memory and the network soar significantly. We also tried to contact other validator service providers. For some validators, memory for the bridge node has reached 20 GB, which is very abnormal. The data for Celestia is 14G while the bridge nodes should not store any data theoretically. Hence the 20G memory for bridge node is a serious problems to be solved.

1)Tendermint is for consensus. Optimint is used by Celestia rollup. There is currently only one sequence for rollups. No consensus is required, but it is relatively simple for the sequencer to upload the data to the main network. If the rollups also need to do consensus in the future, Tendermint should be required.

Optimint and Cosmos Appchains are connected through ABCI while the app chain itself has its consensus mechanism. The consensus is still much more difficult than uploading the data only. So Optimint should not be treated as the competitor for Tendermint, but rather they are used under different scenarios.
On the contact side, only CosmWasm is available, but MoveVM, EVM Solana VM and others will be possible in the future. Right now, they have done two cases and suffer from instability of the network. The transaction submissions sometimes require 10 minutes, which need to be further improved to provide better user experience.
Now the hard-connection (tcp connection) between nodes is GRPC, and the technology is quite new. Rest is being used more with better compatibility.

This part we will talk about the verification of the system. Light nodes can verify the effectiveness of the data without downloading the effectiveness of the data. It’s done through the Quadrupling RS.

So how to ensure the reliability of sampling, such as 100 rows * 100 columns, that is 10,00 shares. Then every sampling is not a guarantee of 1 in 10,000. Quadrupling means that at least 1/4 of the share must be unavailable, then you can select the one which is unrecoverable. The details can be found on the whitepaper 4.5.8. Only when 1/4 is unavailable can it not be recovered, and it is truly effective to find errors. Hence, the probability for reliability drawing once is ¼. After rounds of sampling, say 10 times, 15 times, it can reach the 99%. The reasonable times should fit within 15-20 times.
Does Celestia verify DA correctly? For example, rollups send the data to the network. ETH has a contract to verify the validity of the transaction while Celestia returns it to the rollups itself to verify. Let’s talk about the role of different validators. For example, what role does the node play in rollups vs on Celestia. For example, the light node or a full node on rollups could be a light node on Celestia, where the light node can gain the data through DA which is quite different from the necessity of building the full node in Ethereum.

Under the minimum honest assumption, a light node only needs to connect an honest node to ensure security . What if you don’t believe this? Then you can start a full node and obtain all the data, and check with the root published by the sequencer 1) Rollups verify its own state such as account balance. Celesita has no execution layer. 2) If your sequencer behaves maliciously and sends two blocks with the same block height to Celestia, Celestia guarantees/provides the service that the data sent by sequencer will be returned to Rollups node in the original form. “SCRs are still able to fork, but the decision over what is the canonical chain is delegated to an L1 smart contract. This relies on multisig/centralized teams, majority governance of the rollup (as they decentralize), or they become immutable and forfeit this right (or the L1 could fork, unlikely). Note this upgrade process is subject to majority rule via on-chain governance. Off-chain coordination could deploy a new instance of a SCR, but then you’re starting from scratch drawing users with no history to the chain. SRs are able to fork permissionless, even as a minority, via off-chain governance.” by Delphi Digital.

Fees consist of two parts:

Rollups own byte fees and execution gas. If Ethereum is used as the settlement layer, fees will be paid in local currency which is ETH..
Storage fee paid by sequencer. The storage fee now supports local currency, but the architecture also supports other currency. There is separation of logic between uploading data and payment.
Celestia’s goal is to reduce the total costs for 1) + 2) even lower than uploading data only on Ethereum. The cost for call data on Ethereum is high but is expected to be reduced for Danksharding.
Provides more flexible storage solutions such as working with other chains, eg, ETH, or off-chain storage solutions such as zkporter, StarkEx DAC etc.

The application for Celestia

use mainnet, or work as execution rollups. Celestia will be responsible for transaction ordering
Sequencer for Sovereign rollups will upload data to Celestia. Other validators will obtain data by namespace. The SR will be responsible for execution and security maintenance
ER is built directly into the L1 specification, not deployed as a smart contract. ETH can be used as the settlement layer for Celestia

4)The quantum gravity bridge re-defines Celesita as a plug-in for ETH. 1) ETH acts as the settlement layer of Celestia. Celestia acts as the DA layer of ETH.

Community Q&A

1) What are the types of validators on Celestia?

Full nodes on Celestia work to store the data, unlike full functionality nodes on Ethereum. Consensus nodes and Light (non-consensus) nodes are common. Bridge nodes, provide 1) RPC services for developers 2) provide services for light node connection , and 3) special functions for data sampling for light nodes. So you can think of it as the gateway among full nodes, consensus nodes and light nodes.

2) What is a namespace?

Rollups are uploaded to Celesia and stored in Celestia, and each Rollups only wants to fetch their own data. Namespace is the identity between each Rollup. When the rollup obtains the data from Celestia, it can fetch its own data according to the namespace. Unlike Ethereum, data is sent to Celestia for verification of the signature and recognition only. Celestia only needs to make sure the data is transferred back to the nodes in its original form defined by a different namespace. ETH's Smart Contract Rollups are defined by different names/ series of contracts.

3) How to solve scalability?

As blocks in Celestia consist of pure binary data (unlike call data on Ethereum), Celesita should have relatively high processing power. A block of 100 megabytes seems to be a reasonable size. As long as the sequencer signs it, content does not matter much.

4) How efficient does the DA process work? For example, will the cost change as the storage of data increases? The Ethereum full node has hundreds of G, nearly 1T (1T=1000G).

W3. Hitchhiker: Like other chains, Celestia blocks have limited capacity (K*K for block as mentioned above). The more transactions, the greater the cost. Ethereum users Layer 2. So how does Celestia handle it? First of all, it does not have an execution layer. All Rollups transactions are a piece of binary data for Celesita, which is purely a piece of data and has no practical significance. How to expand the block capacity, light nodes can help the entire network to converge. Celestia prefers large blocks (100M vs 16M IN ETH) . So how do I help the spread of large blocks across the network? Through the following two ways: 1) As more light nodes are involved, the block space is linearly incrased 2）During the DA, shares will be forwarded twice by light nodes, which will accelerate the entire network convergence process and improves network efficiency. The bandwidth of light nodes can also be contributed to help spread blocks between consensus nodes. So we need to increase #light nodes rather than the workload for single nodes.

CFG Labs core team: The DA block size can be understood as the bandwidth for Web2. It’s the key metric. The more bytes a DA block can process per second , the more transactions can be dealt with at the execution layer, and hence the faster the blockchain. Bytes per second is bandwidth in Web3.

5) Again what’s the effect on cost and time for light nodes as the data stored on the network increases?

We have mentioned a few times that Celesita has no execution layer. So their consensus is not about verification of transactions, but to gather, confirm and verify the data and order. So this is not related to execution. For the cost, it mainly consists of storage and bandwidth cost, but no computation cost included. The larger the data volume, the higher the overall cost. The official explanation is the more validators, the greater the processing volume. I don't know how you understand this sentence? If I have a fixed number of nodes, I can handle as much as I can, and as nodes increase, all costs increase. So this expands the processing power for the whole network. Celesita mitigates this capacity conflict problem in another way. Celestia block size has a linear relationship with the light nodes. The light node here is not limited to the light node of a certain rollups, but refers to the light nodes of all Rollups participating in the network. For the entire network, although the overall cost increases, it is shared among more light nodes. So if the data is 100 megabytes, and then becomes 200 megabytes. The number of light nodes required will also double. For a single light node, the cost will not be much differentiated. Hence, we believe the impact on light nodes is limited. Blocks like 100 megabytes have not yet reached the limit.

CFG Labs core team: Celestia block size increases with number of light nodes. The block header of light nodes is proportional to the square root of the block size. The bandwidth advantage mitigates the problem in the long term. As user demand increases X, block header for light nodes increases for SQRT(X), block capacity increases, light node bandwidth increases for X, Celestia output increases by X^2. The overall effect should be neutral.

6) Questions: How are the incentives being distributed among storage validators, light validators, consensus nodes etc?

W3: The current economy design is mainly payable to full nodes, which store the data for Celestia. Allocations for others are not announced yet. Like we are currently building a set of nodes, we have not thought about how to distribute incentives among different validators. But inspired from the POS, the rewards for Consensus nodes are determined should depend on the proportion of staker delegated on the network. When it comes to security and attack, there must be a balance.

7) Question: We have seen Execution layer, DA layer, Celestia and our existing solutions, such as Ethereum, ZKP, ORU, and other ecosystem projects in Cosmos. For example, if I want to do rollups myself, how should I deploy it?

Answer: There are three ways to deploy on Celestia which we have discussed above. If you go directly to the Celestia mainnet or simple Rollups, then you can directly set up the full node for Celesia. If sovereign Rollups, you need to consider how to design your sequencer and your own network security mechanism. The sequencer is mainly responsible for sending data to Celestia and paying for the storage of the data. Other nodes on your Rollups can act as light nodes on Celestia to send back data. You have your own chain and business logic. Quantum Gravity Bridge is a contract deployed in ETH. If you are an ETH rollup, you will not deal with Celesitia directly.

8) Cevmos Ethereum smart contracts can be deployed in Cevmos, Celesita + Cevmos + Rollups. Cevmos can be understood as the execution layer or/and settlement layer. Other nodes for Rollups obtain the data from Celestia and pass it to Cevmos to execute and update its own state. Depending on your design, your sequencers can choose to execute or not. If you need to verify your transactions, and package the translations, you will leverage cevmos to verify the effectiveness of the transactions before submitting packages.

The entire process should work as follows: The rollups send the transactions to the sequencer, and the sequencer will submit the transitions to Celestia, other nodes on rollups will obtain the data from Celestia by the namespace. Then you will use the VM to execute. The settlement layer mainly works to solve the asset exchange problems and provide the secure guarantee (such as Ethereum). Asset exchange among different chains will require both bridges (trusted or trustless) and settlement layers. The most common rollups today have only one sequencer and will introduce consensus mechanisms such as Tendermint, Avalanche and others in the future. We call it a “sequencer-less” rollup where anyone can propose blocks by Mustafa.