CatcherVC

Posted on Apr 26, 2022Read on Mirror.xyz

Analyzing Layer2 Metis: The Road to Decentralization

Author: Web3er Liu, CatcherVC

Read in Chinese Version

Abstract

  • The fundamental problem with OP Rollups like Optimism and Arbitrum is the centralization of Sequencer nodes, which requires a solid solution.
  • Metis tried to take the lead in realizing the decentralization of Sequencer. The project party has opened up the Peer Node network and transferred the power of running the Sequencer nodes to community members or other institutions.
  • Metis changed the storage layer’s structure and changed the form of publishing data to Ethereum. By integrating Memolabs, the storage cost can be significantly reduced, making Metis one of the lowest Gas bills in mainstream Layer2.
  • By introducing a new mechanism, the latest version of Metis, after integrating Memolabs, still has reliable security and data availability. The team judged the potential situation and developed effective measures.
  • Metis supports node operators to register in the form of DAC (institutional DAO) to obtain continuous token income. At the same time, it provides a simple one-stop DAO carrying service, reduces the difficulty of DAO operation, opens up community ecosystem governance (CEG), and transfers the authority to maintain the Layer2 network to community members.

Problems with traditional OP Rollup

As concepts such as Web3, Metaverse, and NFT have entered the public eye in the past year, the Crypto industry has officially entered a period of rapid growth. Under the pursuit of various capitals and massive users, Ethereum has become the core of the entire Web3 narrative under its first-mover advantage. After a long evolution, its system architecture has fully realized decentralization and security, becoming a veritable "Stellar Public chain." At the same time, inefficiency severely limits the development of this public chain. Compared with VISA, which processes thousands of transactions per second, with a TPS of less than 20, Ethereum likes an antique of the old times, which is far cry from Vitalik's grand vision of a "world-class decentralized application platform."

In order to meet the massive demand of the Web3 market, different solutions such as Side Chains, new Public Chains, and Rollup have successively entered the stage of history. While star projects such as BSC, Polygon, Solana, Arbitrum, and Optimism are dividing up traffic, their inherent defects are more and more apparent. Since the block generation speed constrains TPS, almost all major Layer2 or new public chains have compressed the number of nodes or unbundled the "consensus" with the block generation process, which directly reduces the block time, but seriously weakens the decentralization and the system security.

Taking Optimism as an example, it uses a single miner node called Sequencer to generate blocks in Layer 2, blocks are generated in seconds. New blocks do not need to be immediately handed over to other nodes for verification but can be finalized locally, which saves a lot of time. Since there is only one block-producing node, the allocation of "bookkeeping rights" is deterministic, so the POW process (the step of randomly assigning bookkeeping rights) can be directly replaced.

By reducing the block generation process, Layer2’s local blocks can go from generation to finalization in 1 second or less. After the user initiates a transaction request, the result can be received in two or three seconds, evenly matched to WeChat Pay.

However, at this time, the new block of Layer 2 has not been audited by the verification node, and there is a possibility of non-compliance. In this regard, Sequencer regularly publishes a local copy of the block on Layer1, including transaction data and StateRoot (associated with the account information on Layer2). The Verifier (verification nodes) on Layer2 will automatically read the content published by the Sequencer and conduct audits to determine whether the Sequencer is suspected of fraud.

Essentially, Optimism uses Ethereum as a "court" for disclosing data and dealing with disputes, and the key point is how often Sequencer publishes data on Layer1. If the Sequencer submits local data for a long time, it will undoubtedly delay the audit progress of the Verifier, and it will take a long time for nodes to reach a consensus, which will seriously weaken the reliability of Layer2.

According to the official browser of Optimism, the time frequency of Sequencer publishing status information on Ethereum can be as slow as 37 minutes, which means that after Sequencer generates a block, Verifier will have to wait 37 minutes before auditing. In contrast, a new block in Ethereum takes only 13 seconds to be audited by nodes on the entire network. The information asymmetry between the Sequencer and Verifier nodes on Optimism is severe, and the reliability of the consensus mechanism is much lower than that of Ethereum.

In this regard, Arbitrum, which is also in the OP Rollup faction, shortens the interval for submitting status information to 2 to 5 minutes once, enabling Verifier nodes to conduct status audits as soon as possible, which significantly reduces the information gap.

However, Arbitrum has the same flaws as Optimism: The Sequencer node responsible for producing blocks is run by the official government, and the "bookkeeping right" is not transferred to the outside world. If the mechanical design is not yet perfect, "procedural justice" cannot be guaranteed. For the sake of insurance, the block producers of Arbitrum and Optimism are endorsed by the official credit to make up for the imperfection of the current system mechanism.

The consequences of this are clear: Arbitrum and Optimism essentially become centralized operators. Although both parties allow users to run Verifier (verification nodes) and challenge Sequencer freely, the official still has the absolute right to speak to the appointment and removal of Sequencer. In this way, even if Verifier points out that the current Sequencer has done evil and forces it to step down, the new Sequencer will still be officially designated.

Essentially, Layer2's block-producing power is concentrated in the hands of Arbitrum and Optimism officials, and its foundation is based on "credit" rather than "procedural justice." At the same time, running Sequencer nodes by officials will bring another big problem: the number of block-producing nodes is small, and the physical location is centralized, which is prone to DDOS attacks or other types of single points of failure.

Taking Arbitrum as an example, its Sequencer node went down twice, which has attracted widespread attention. On September 14, 2021, both Arbitrum and Solana went down due to DDOS attacks, and the block-producing node received too many transaction requests in a brief period, which eventually led to a crash; on January 10, 2022, Arbitrum's Sequencer node, When it went down again, the official said that the node had a hardware failure, and the standby node equipment did not complete the handover in time. Finally, the "single point of failure" caused the shutdown of the entire Arbitrum network.

It is conceivable that the disadvantage of centralized systems such as Arbitrum and Optimism lies in the excessive concentration of resources. Only a tiny number or a single node is responsible for generating blocks, which will make it bear a large amount of access traffic and quickly induce a single point of failure. Block power also makes "fraud-proofs" and "challenge mechanisms" useless, unable to curb the problem of node evil from the root.

Regarding their inherent defects, Arbitrum and Optimism officials have stated that they will gradually improve and implement decentralization in the future. However, the two have not given a reliable solution, and the concrete realization of decentralization is still far away.

In order to comply with the fundamental principles of decentralization, Metis, which is also the OP Rollup scheme, has officially started to reform the system architecture recently, trying to take the lead in realizing the decentralization of Layer 2 in terms of architecture and economy.

  • By opening up the peer node network, Metis will transfer the power of running Sequencer’s block-producing nodes to community members or other institutions and promote the rapid synchronization of information between Sequencer and other peer nodes to prevent them from doing evil;
  • Metis supports node operators to register in the form of DAC (institutional DAO) to obtain continuous token income.
  • Metis officially opened the Community Ecosystem Governance (CEG) and further transferred the authority to maintain the Layer2 network ecology to community members.

Through the above methods, Metis plans to take the lead in realizing the decentralization of Layer 2.

In addition, Metis changed the format of backing up data on Ethereum. On the premise that the peer-to-peer node network can immediately verify the Sequencer's local blocks, and under the premise of preventing it from doing evil in the Layer2 network, Metis backs up the transaction instructions to the off-chain centralized platform, Memolabs, and provides the storage location of the transaction data in Memolabs on Layer1 instead, at the same time, the StateRoot corresponding to each transaction is still published on Layer1.

For the possible "challenge" and "fraud-proofs" scenarios, Metis adds other functions so that when the above scenarios occur, the challenger can restore the original data of each transaction instruction on Layer 1, and complete the "fraud-proofs" without hindrance, Make the existing version and the old version's mechanism equivalent.

By introducing peer nodes and integrating the Memolabs storage layer, Metis shifts the storage task from Ethereum to peer nodes, Ethereum, and Memolabs and introduces new mechanisms to ensure reliability. Since the other two share the storage task, Metis can reduce the data capacity published on Ethereum as appropriate, reducing Gas consumption and significantly reducing the Layer 2 fee.

In the following, the author will interpret essential measures such as Metis' implementation of a peer-to-peer node network and integration of Memolabs storage.

Peer Node: Implement Block Producer---Sequencer Rotation

In conventional OP Rollup schemes such as Optimism and Arbitrum, block producers have uniquely identified: only one Sequencer executes transactions and packing blocks. This directly eliminates the randomness of the block-producing nodes. At the beginning of each block-producing cycle, the system no longer has to waste time selecting block producers-in contrast, before each new block is generated in Ethereum, it has to pass The POW or POS process (after merging) randomly selects the block producing node, which seriously delays the time.

However, the randomness of the block-producing node can significantly reduce the probability of a single point of evil due to the frequent rotation of accounting nodes, the possibility of malicious nodes controlling the right of charging to an account is very low. Even if a malicious node obtains the accounting right of a new block, it will still be rejected by other honest nodes if the block’s publication is not compliant. Finally, the honest nodes will re-elect a new block producer, re-publish a compliant block, and the malicious node will be directly overhead.

In this case, as long as 2/3 of the nodes in the network are honest, the malicious nodes can be effectively restrained, which is the famous PBFT mechanism (Practical Byzantine Fault Tolerance). However, the effectiveness of PBFT is based on enough nodes. PBFT will only take effect when the number of nodes is enormous, it is difficult for malicious nodes to attract a large number of nodes, and it is not easy to form collusion. When the number of nodes participating in the block generation is small, PBFT will no longer be applicable, and at this time, the possibility of a single node of evil is exceptionally high.

Existing OP Rollups, including Optimism and Arbitrum, almost all agree that Sequencer will not do evil by default. If the Sequencer behaves maliciously, the Verified node is allowed to "impeach" it, this process is called "challenge." However, the problem is that the data synchronization between the Verifier node and the Sequencer is not immediately performed, and there will be a delay in the middle.

As mentioned earlier in this article, the data synchronization delay of Optimism nodes can exceed 30 minutes, and it will take half an hour after the Sequencer generates a new block for the verification node to audit, which will cause potential security risks. Although Arbitrum reduces the delay to a few minutes, it does not open the authority to run Sequencer to institutions other than the official government, which is not conducive to economic decentralization. In addition, it is based on the "credit" of the project party, which seriously violates the “procedural justice" principle of blockchain.

In addition, since Optimism and Arbitrum do not issue tokens, they cannot incentivize validator node operators with high intensity, which is not conducive to expanding the number of nodes, making Layer2 more like a consortium chain rather than a public chain.

In order to avoid the above problems, Metis has made many improvements to the original architecture of Optimism, the most important of which is to open the Peer Node.

  • Conventional public chains such as Bitcoin and Ethereum are P2P networks composed of peer nodes. These nodes will frequently synchronize information to ensure consistent status; simultaneously, each peer node can voluntarily become a miner and participate in block generation. After a new block is generated, it will be propagated to other peer nodes for auditing purposes.
  • Metis has built a peer-to-peer node network called Sequencer Pool, allowing community members to run peer-to-peer nodes. These nodes act as Sequencer through rotation and solve the centralization problem of Sequencer nodes in OP Rollup;

  • After the current Sequencer produces a block, it will synchronize the new block to other peer nodes for auditing to prevent a single point of evil. Every once in a while, the Sequencer will be changed to achieve the decentralization of accounting rights.
  • Every block generation cycle of ordinary public chains has a process of randomly selecting block producers, which will waste much time. There are not as many peer nodes in the Sequencer Pool as large public chains, and the rotation period of block producers is relatively long. Within each cycle, the Sequencer remains single. In the future, Metis will gradually shorten the rotation cycle and introduce a new timestamp generation mechanism.
  • Metis supports community members in running peer-to-peer nodes and provides token incentives for them. Peer-to-peer node operators often register in the name of DAC (institutional DAO). The hardware device is equipped with a minimum of 8-core CPU and 32GB of memory and must pledge a certain number of Metis tokens.

In essence, the Sequencer Pool, which was initially a subnet under the Metis network, has become a "committee." This committee is composed of peer nodes. Its function is to act as or supervise a Sequencer, similar in form to a POS public chain.

According to the scheme being implemented by Metis, the Sequencer Pool has been put into operation with a scale of more than a dozen peer nodes. Under such a network scale, the time complexity of communication between peer nodes is less, and consensus on new blocks can be reached immediately. At the same time, different peer nodes can act as network loads to meet external access requests, and users do not need to accept data provided by a single node unilaterally.

Metis now gets two security layers from the peer-to-peer nodes network and the Verifier nodes. Among them, the peer-to-peer nodes can verify the local data of the Sequencer on Layer2 in real-time, and the Verifier is mainly responsible for verifying the data submitted by the Sequencer to Layer1.

In the future, Metis plans to expand the number of peer nodes in the Sequencer Pool on a large scale to make it more secure, and incorporate the Verifier verifier node into the Sequencer Pool list, so that all peer nodes can act as Sequencer, as well as serving as Verifier. At the same time, Metis plans to introduce a new algorithm and timestamp generation mechanism while maintaining high efficiency to achieve "change the Sequencer every few blocks" to ensure decentralization.

New storage structure - "Don't add entities if you don't have to"

In most public chains or Layer 2, the database that records user information adopts a tree-like structure, called a state tree, and the hash value of the tree root is called the StateRoot. After a transaction instruction is executed, the status of some accounts will inevitably change, and the hash value of the root of the status tree will also change accordingly. It can be said that the execution of each transaction will generate a new StateRoot. From the perspective of time, the two are in a one-to-one correspondence.

If you list each [transaction instruction content] and the corresponding [StateRoot] in chronological order, you can get an accurate ledger. In traditional OP Rollup schemes such as Optimism, this is what Sequencer stores on Ethereum.

The Verifiers read these and check their accuracy. Generally speaking, the Verifier node will execute the transaction instructions in chronological order and obtain a batch of StateRoot through its calculation. After that, the Verifier only needs to compare the StateRoot calculated by itself with the StateRoot submitted by the Sequencer. For example, when a teacher does not know the standard answer in advance, he temporarily uses mental arithmetic to correct students' math homework.

If Verifier finds a problem with a transaction instruction or the corresponding StateRoot submitted by Sequencer, it will initiate a "challenge" and provide a "fraud-proof."

In Optimism and older versions of Metis, Sequencer will publish transaction instructions and corresponding StateRoots to Ethereum, essentially using Ethereum as a storage layer and using the Ethereum network to handle the "challenge" process. Although this can ensure data availability, the Gas consumption is very high.

Take a batch of transactions released by Optimism in Ethereum as an example. The batch contains a total of 204 transaction instructions, and the gas fee consumed exceeds US$211, which is equivalent to the storage fee of a single transaction instruction exceeding US$1; additionally, considering the Gas required to store the corresponding StateRoot for this batch of transactions , Optimism's storage fee for a single transaction can reach $1.50, which is still too high for most users.

In response to this problem, Metis has made essential adjustments recently. Metis does away with the step of directly storing transaction instructions on Ethereum and dumps transaction batches to Memolabs, a platform similar to Filecoin but with lower storage costs and faster data retrieval speed. By integrating the Memolabs storage layer, Sequencer first stores many transaction instructions in Memolabs and then publishes the storage index corresponding to this transaction batch on Ethereum. The Verifier node can read the original transaction data from Memolabs through the index value.

At the same time, since the StateRoot is more critical than transaction data, they are still stored in Ethereum.

To sum up, the philosophy of Metis is: that there is no need to deposit the content of Ethereum, and it can be exchanged for the equivalent in other ways. This can save storage costs and reduce the cost pressure on users. This aligns with Occam's razor: "Do not multiply entities if you do not have to."

Through this storage structure, Metis can significantly reduce storage fees, reducing the transaction fee of a single Layer 2 transaction to a few cents. Metis has become the lowest gas fee in mainstream Layer 2.

However, Metis's approach raises other questions: Does changing the storage structure change security or data availability? In this regard, we will analyze a variety of possible outcomes.

The security and data availability issues of Metis and OP rollup have two aspects. The FIRST ONE is:

When the Sequencer executes the transaction in Layer 2, it will immediately finalize it locally, temporarily possessing "finality." The specific scenario is that after a user initiates a transaction request on the Metis network, the result will be received within seconds. The question here is, is the temporary "finality" given unilaterally by Sequencer reliable?

Since Metis's Sequencer will immediately synchronize the information to the peer nodes of the Sequencer Pool after the block is generated, the nodes can immediately audit the block content, and if it is found that the Sequencer has submitted an illegal block, it can be removed from the Sequencer Pool. Therefore, the security here is equivalent to that of ordinary public chains. At the same time, the outside world can choose information sources among multiple peer nodes without unilaterally trusting a node, and there is no problem with data availability.

The SECOND QUESTION is:

Will the verification process and challenge mechanism be affected after Metis transfers the transaction data to Memolabs? Will the nodes that newly join the Metis network encounter inconvenience when synchronizing historical data?

There are many possible situations involved here that can be classified and discussed. Since Metis still publishes the StateRoot to Ethereum, the availability of the StateRoot will not be affected. The availability of transaction data is targeted at Verifier nodes or nodes newly added to the Metis network.

For the latter, new nodes only need to synchronize historical data through other Verifiers or peer nodes and can also read transaction data on Memolabs and StateRoot records on Ethereum. At present, Metis has more than 80 privately running Verifier nodes, which already have vital data availability. Considering that the number of Verifiers is still expanding, new nodes should not face many problems when synchronizing historical data.

The problem is for the existing Verifier nodes: whether the transaction data can be successfully obtained and the corresponding StateRoot can be checked. If it is found that the content submitted by Sequencer is incorrect, can the "challenge" be successfully carried out on Ethereum?

For this problem, the following scenarios can be analyzed separately:

  1. If Sequencer provides the Memolabs index on Ethereum so that Verifier can read the transaction data smoothly. After checking that these transaction instructions are correct (digital signatures, etc., are correct), the remaining checkpoint is the StateRoot stored on Layer1.
  • If, after auditing, each transaction can be matched with the corresponding StateRoot, Verifier completes the data synchronization, and there is no need to initiate a "challenge." There is no problem at this time.
  • If Verifier finds that a particular transaction instruction and StateRoot cannot match, the StateRoot must be wrong. The Verifier can ask Sequencer to disclose the transaction data corresponding to the Error Status Root to Layer1.
  • If the Sequencer agrees, the "challenge" process goes smoothly, and the Sequencer is penalized;
  • If Sequencer does not agree, Verifier can write the transaction data read in Memolabs into Ethereum to complete the "challenge," and Sequencer will also be punished;

Obviously, in the above scenario, data availability and "challenge" mechanism are not affected.

2. If the Sequencer stores forged transaction instructions in Memolabs (the digital signature is invalid), Verifier will initiate a "challenge"; in addition, Verifier must obtain the correct Layer2 native transaction instructions to verify the correctness of the StateRoot.

At this time, Verifier can ask Sequencer to publish related transaction batches on Ethereum, which will cause Sequencer to spend a lot of gas fees, which is equivalent to a disguised penalty; If Sequencer refuses, Verifier can disclose the wrong data read from Memolabs to Layer 1 and start a "challenge," Sequencer will be punished more severely.

Under normal circumstances, after the Verifier successfully challenges the Sequencer, the loss it suffers will be much higher than the gas fee consumed when the transaction batch is published on Layer 1. Therefore, if Verifier requires Sequencer to publish transaction data on Layer 1, it must disclose the correct transaction data.

At this point, the Sequencer must release a single transaction batch required by the Verifier, which contains hundreds or thousands of transaction data, and the gas consumed when Layer 1 is released will be very high, even hundreds of dollars, which is equivalent to a disguised penalty.

As seen from the above discussion, data availability and the "challenge" process are not affected.

3. If Sequencer publishes a fake Memolabs storage index on Layer 1, Verifier cannot successfully read the data contained in the transaction batch. At this time, it can request Sequencer to disclose the transaction batch on Layer 1 as described above. If it refuses, the Verifier can obtain the corresponding data from the peer node, continue the subsequent verification work, or initiate a challenge.

Through the above well-designed mechanism, Metis can protect the rights and interests of Verifier nodes. However, in order to prevent Verifier from abusing its power and maliciously requiring Sequencer to write transaction data in Layer 1 and to attack honest Sequencer runners through gas consumption, Metis makes the following requirements:

  • If Verifier requires Sequencer to write transaction data on Layer 1, it needs to pledge a certain amount of funds in advance to obtain the whitelist qualification, and every time a similar command is issued to Sequencer, a handling fee will be consumed; the value of this handling fee has been carefully calculated It can prevent Verifier from frequently sending unreasonable requests to Sequencer.
  • Any node can initiate "challenges" and "fraud proofs." In theory, these nodes can cooperate to ensure data availability and security.

In conclusion

According to the core arguments put forward above, combined with the recent official developments of Metis, it is concluded here:

The fundamental problem of OP Rollups such as Optimism and Arbitrum is the centralization of Sequencer nodes, which requires a reliable solution; Metis tries to be the first to realize the decentralization of Sequencer.

  • Metis opens the Peer Node network, cedes the power of running block-producing Sequencer nodes to community members or other institutions, implements a rotation system, and promotes rapid synchronization of information between Sequencer and other peer nodes to prevent them from doing evil;
  • Metis changed the storage layer structure and data backup form on Ethereum. By integrating Memolabs, Metis has significantly reduced storage costs and has become the lowest gas fee in mainstream Layer2.
  • Through careful mechanism design, the new version of Metis, after integrating Memolabs, still has robust security and data availability. The Metis team has judged the possible situations and formulated corresponding measures;
  • In order to further reduce the power of Sequencer, Metis plans to add the role of Proposers in the future. After Sequencer publishes transaction data, Proposers are responsible for submitting the StateRoot corresponding to each transaction to Ethereum, which can form more vigorous decentralization checks and balances.
  • Metis supports node operators to register in the form of DAC (institutional DAO) and provides them with continuous token income. In this regard, issued Metis has an advantage over unissued Optimism and Arbitrum;
  • Metis provides a simple one-stop DAO delivery service, reduces the operational difficulty of DAO and DAC organizations, opens up Community Ecosystem Governance (CEG), and transfers the authority to maintain the Layer2 network ecosystem to community members. The Metis ecosystem has 500 DAC organizations with nearly 5,000 members.
  • After Metis integrates the Memolabs decentralized storage layer, DAO organizations within the ecosystem can transfer data that does not need to be disclosed to Memolabs. The corresponding storage index can only be accessed by permitted users, which ensures that DAO maintains its privacy.
  • Metis will support the Fragment structure of multiple subnets in the future, allowing different DAO organizations to run MVM virtual machines with independent states to achieve a multi-chain fragmented mechanism similar to ETH2.0.
  • Recently, Metis has opened the NFT bridging function; combined with ultra-low gas fees, Metis is committed to building the best platform for NFT users;
  • In the future, under the condition that the system's fault tolerance is strong enough, Metis will shorten the challenge period as appropriate and become the most convenient cross-chain in Layer2

Metis