How to create end-to-end tests for web3 applications. - Jose Aguinaga

Abstract

Unlike standard web applications which can easily mimic a user workflow for end-to-end testing, web3 applications struggle to recreate this process. This is due to mostly two important factors unique to their nature. First, web3 applications operate under an ever-changing backend mostly generalized as a blockchain, powered by one or more smart contracts. The second is that to interact with any web3 app, users have to sign transactions using some wallet-like third-party software and submit it via a provider.

The following guide showcases the need for end-to-end testing, how DApp developers can circumvent these limitations by leveraging a series of strategies, and examples of how to achieve them. Although the focus and examples will be within the context of the Ethereum blockchain, these tips can be replicated across any blockchain that provides a consumable RPC-endpoint against a local blockchain node.

If you have worked with smart contracts before, you can skip the first two sections. For details on the actual how-to, skip the first three sections.

The architecture of a web3 application

Within the blockchain industry, it is a well-known fact that the concept of smart contracts was one of the biggest game-changers up to this day. Smart contracts are the main endpoints for most blockchain business logic nowadays. Given an existing deployed smart contract, anyone can submit a computable request to it in order to transform the stateful information it contains. Given the rules of the smart contract and the nature of a request, its state, usually involving digital assets, would change for multiple stakeholders in the blockchain the smart contract lived.

Although the first smart contracts were simply in their nature (e.g. crowdsale smart contracts, storage), the growth of the industry and the complexity of the protocols that evolved from these basic lego blocks changed fast, making the interaction with these single computational units harder. Soon enough, developers had to come up with libraries and tools to build User Interfaces (UIs) that would facilitate this exchange.

Nowadays, modern Decentralized Applications (or simply “DApps”) independent of the blockchain they use, are structured by two components: the smart contracts, that provide the business logic of the product or protocol on itself, and the user interface used to interact with them. Within the usual understanding of modern software applications, it would not be inaccurate to say that smart contracts are the backend of a DApp and the user interface its front-end.

edition://0xD79B3652A3cf8BC12B185bC343adda573aB5222E?editionId=0

Although DApps can take many shapes and forms, it would be fair to say that most live as a website, and as a result, their main interaction runtime is a browser. However, savvy blockchain users can usually interact with the smart contracts directly w/o having to rely on a graphical user interface.

Challenges around a web3 application test-suite

As with almost any software, unit tests can be created against both the smart contracts of a web3 application as well as its user interface. Multiple frameworks exist to achieve this, depending on the nature of your project. For instance, within the Ethereum blockchain ecosystem, you can test smart contracts with Foundry, Dapptools, or Hardhat, which usually also provide an Ethereum Virtual Machine (EVM) runtime for you to interact with the smart contracts locally. In a similar fashion, Jest and Mocha are commonly used to test web applications. These tests ensure the basic expected functionality of each component’s basic logic blocks.

Integration tests, the assertion against multiple flows of an application via stateful changes, are also usually not a problem. Smart contract libraries can mimic their own specific changes by mocking the underlying EVM operations that require so. This would be the equivalent of faking a database migration for a backend, for instance. In a similar fashion, modern UI frameworks are able to mimic the connectivity and response of a successful or failed smart contract interaction, ensuring the user sees what is expected from the website given specific actions. Again, this would be the equivalent of faking an API response the front-end is expected to talk to.

The problem arrives on an end-to-end level. In short, because the actual smart contract and UI interactions are usually done via a third-party provider that submits the transactions (e.g. MetaMask or TallyCash), faking these UI interactions is extremely cumbersome. Without going to extensive lengths of putting together the same mechanisms used during a normal web3 transaction exchange, end-to-end testing is hard to do. As a result, one could argue they are often unnecessary, mostly due to their brittleness, complexity, and maintainability costs.

edition://0x71Bc7Ae4b1Fe18b9a59278F067ba6c730389743f?editionId=0

The test pyramid, a concept introduced in the early 2000s and popularized by Mike Cohn in his book “Succeeding with Agile”, highlighted the rationale around unit vs UI tests. Faster and “cheaper” tests will lay at the bottom of the pyramid, whereas slower and more “expensive” tests will likely be at the top.

The case for end-to-end testing in DApps

Given the costs in maintenance and the required complexity, are end-to-end tests worth the hassle? In some cases, the answer is yes, and web3 applications might be one of those cases. There are bugs that can be missed otherwise and would involve the loss of financial assets, which usually doesn’t affect most web2 apps. Because DApps’ smart contracts are usually unable to roll back previous states, they are a prime candidate for identifying bugs up to an interface level.

Let’s see a few examples. Last year, Tally was hit by the following bug: given the option to vote for a particular referendum, their UI offered prompted users to sign a transaction that would signal their option as an “against” on the topic in question. However, when the actual transaction was offered by the UI, an “in favor” request was shown instead, which allowed at least 5 votes to be cast wrongly before the team patched the bug quickly. Were this not be the case, the outcome could have resulted in unexpected management of the underlying referendum assets. You can read more about the topic in their own post-mortem.

https://twitter.com/voteWithTally/status/1431387008765763585?s=20&t=qXbPMqeR-7sm-BRVTrQ3Fw

In a similar fashion, later the same year another bug was found in another DApp, this time a supply chain attack. This one allowed a project to redirect digital assets meant to be used within the platform, to the malicious UI developer involved. Although it was not a bug on the software itself, it had the same consequence: the expected state in the smart contract was not achieved by using its “official” UI. Despite the code being “correct”, the actions the user took with this UI were unexpected.

Both bugs could have been caught by having a set of tests that would mimic user actions via its UI and the expected third-party provider*. The test would have created a new user within the system, executed the expected action against the UI via its RPC-connected provider, and queried the smart contract state via the same UI after the transaction against its local blockchain was completed.

edition://0x494CB96871e7d47D7c4430e92B3c4A8Fd28d9511?editionId=0

These bugs are challenging to catch because usual integration tests can’t migrate the application state against a running local instance and mock it instead. If this wasn’t enough, production systems have constant state changes that aren’t even considered. Since anyone can call a deployed smart contract, its state is always evolving.

Creating end-to-end tests for blockchain apps

To create full-stack tests for DApps we should follow these important strategies:

Mimic the user.
Mimic the stack.
Mimic the workflow.

Mimic the user

edition://0xe2031201792e28fDc354118a946e681dDCb78782?editionId=0

This should be a no-brainer for most web2 applications. Back in the day, you could simply spin up a Selenium runner to force a client to sign-up as a new user and start the entire new workflow. However, for web3 applications, users are abstractions of public keys, derived from a private key usually managed by third-party software. “Creating” a new user usually means “faking” a specific provider, holding this private key, which in turn, adds extra complexity to the test.

What we suggest is going a bit deeper and faking the account itself. Instead of forcing a web application to interact with a mocked provider, we feed the system a fake user that can execute the same transactions as a “live” provider. Since for most blockchain applications these providers are isolated key managers exposing an API layer against an RPC-endpoint that connects them to the blockchain, we can simply fake the key manager, and stub the API layer.

Within the Ethereum world, we can do this by using a Burner Provider. Originally introduced by Austin Griffith, a burner provider is an Ethereum wallet powered by a private key stored in the browser storage. By having the provider stored locally, one can quickly sign transactions as a “live” user, which even though is not at all ideal for real digital assets, is great for local development and yes, end-to-end testing.

Here’s an example of its implementation in Poster, an on-chain social media engine:

const { account, fallback } = useEthersWithFallback()
const current = useFallbackAccount ? fallback.account : account

Here, we are telling the application to pick between two “users”, one is the account, usually obtained by a normal web3 provider, whereas the second (fallback) is our burner provider, which we start as follows (code adapted from eth-hooks):

const useEthersWithFallback = (): Web3Ethers & { fallback: TBurnerSigner } => {
  const result = useEthers()
  const provider = result.library || new JsonRpcProvider(POSTER_DEFAULT_NETWORK)
  const fallback = useBurnerSigner(provider);
  return { ...result, fallback }
}

Given useEthers (called etherContext by other libraries), we can obtain the underlying connected web3 provider, and submit a fallback if missing. This works also as a way to expose your application to a “read-mode” or “private mode” only, which can be seen in production by Poster.

Most Ethereum DApps rely either on web3js (1.70.x at time of writing) or etherjs (5.x at time of writing) to handle connectivity to a third-party provider (right now MetaMask being the most common). Both can take a burner provider like the one from Austin or the one used by Audius.org, Hedgehog - https://hedgehog.audius.org/

Mimic the stack

edition://0xb140811D1a1CaaA9C990Aaf3cAdF252F8023Baa3?editionId=0

With our application able to behave as a web3 user on demand, the next step is to bootstrap all the components needed for the application to run. This is, undoubtedly, the main reason why end-to-end tests are expensive: to fully replicate a user’s behavior, the entire toolchain needs to be in place. In production, these components are available to any user, ready to be consumed. To properly test this during our continuous integration, we need to recreate them in their entirety.

Depending on the blockchain in question, this can be a daunting task. As mentioned before, within Ethereum, there are extensive tools to make this happen, thanks to the vibrant community around the project. For our example, we’ll be using Hardhat, which is able to spin an EVM-compatible node, exposing a series of RPC calls** that can help us fake even an existing user. Additionally, we’ll spin up a TheGraph subgraph, used by our demo application to expose a GraphQL server for us to query.

Here’s what starting the full-stack of a modern web3 application looks like, given all dependencies installed and code cloned. Even though is formated as a GitHub action (see full code in here), it can be easily ported.

##### Hardhat node #####
- name: Run hardhat node
  run: yarn run node --hostname 0.0.0.0 &
  working-directory: contract

- uses: ifaxity/wait-on-action@v1
  with:
    resource: http://localhost:8545

- name: Deploy smart contracts
  run: yarn deploy --network localhost
  working-directory: contract

##### The Graph Indexer #####
- name: Run indexer (graph + ipfs)
  run: docker-compose up -d
  working-directory: subgraph

- uses: ifaxity/wait-on-action@v1
  with:
    resource: 'tcp:localhost:8020'
    verbose: true

- name: Configure subgraph for local node
  run: yarn define
  env:
    NETWORK: localhost
  working-directory: subgraph

- name: Build subgraph for local node
  run: yarn codegen && yarn build
  working-directory: subgraph

- name: Create & deploy subgraph for local node
  run: yarn create-local && yarn deploy-local -l v0.0.1
  working-directory: subgraph

##### Poster App #####
- name: Run Poster app
  run: yarn dev:local &

Mimic the workflow

edition://0xDb419c20E33044d7cDE3587391EC263F35bfb1F2?editionId=0

The last step is to mimic the workflow of your application. Here you are expected to use the UI you have put together, and not interact with your smart contracts any other way. A wrong reflection of your smart contracts state in your UI should be caught up by your integration tests. If you bypass your own UI, you are rendering the end-to-end test useless, as the smart contract-UI connection is the main aspect we are testing.

There are a few end-to-end test runners like Cypress, but I’ve enjoyed using Playwright. To mimic your user’s workflow, a Playwright test would look something like this (the entire code available here).

test('app can create a post and display it', async ({ page }) => {
  test.setTimeout(60000);
  const post = `This is a very unique post: ${Date.now()}`
  await page.goto('http://localhost:3000/');
  await page.locator('[aria-label="Connect Wallet"]').click();
  await page.locator('[aria-label="Private (Demo)"]').click();
  await page.fill('[aria-label="Post content"]', post);
  await page.locator('[aria-label="Submit Post"]').click();
  await expect(page.locator(`[aria-label="Post"]`).first())
    .toContainText(post);
})

Here, we can see our basic workflow: a user connects to a Private wallet (our burner wallet), proceeds to fill the content of the post to submit, then submits it. The test will succeed if the post is shown as the first post in the application, and fail if it doesn’t show up after 1 minute.

And that’s it. You can see the entire integration of all these three strategies being implemented in a DApp in the following PR:

https://github.com/onPoster/app/pull/97/files

Not a silver bullet.

Although end-to-end testing is great and particularly important for web3 applications, they are not meant to be the silver bullet to protect your DApp from all bugs. If your supply chain gets entirely compromised, your tests can be modified to ignore the bug. There are still state and time-specific conditions that could break your application both in a UI or smart contract level, and you won’t be able to catch it unless you prepare for that particular case. They are also, as shown, hard to create and maintain.

That being said, the reality is that as of today, there are no guarantees for users that engage with the smart contracts via UIs that they will do what is expected, so end-to-end tests are the closest we have to work around those limitations. I personally asked a few skilled technical individuals online, just to get the confirmation on this sentiment by one of Tally’s co-founders, Dennison Bertram.

https://twitter.com/DennisonBertram/status/1444631918138568704?s=20&t=hc-zjeU1C0CBQnULwU0ZFQ

You’ll find multiple opinions around the topic, and it’s up to you within your organization and team, to decide the amount of time and resources you can allocate to end-to-end testing. Depending on the amount and value your digital assets are handling, it would be probably worth investing some additional cycles to verify the entire stack of your application. Guillermo Rauch, Vercel’s founder, said it best.

https://twitter.com/rauchg/status/807626710350839808?s=20&t=hc-zjeU1C0CBQnULwU0ZFQ

Hopefully, the tooling needed for creating these sorts of tests will only get better, reducing the costs, learning curve, and complexity. Given enough time, end-to-end testing across web3 applications will no longer be a matter of if, but a matter of when.

If you're interested in getting deeper into web3 development, some popular free resources include Ethereum's NFT tutorial, Alchemy University, or Road to Web3.

* Tally’s team described in their post-mortem that they have introduced exactly this set of tests as part of their workflow via a cypress extension.

** For this demo, we used mostly the burner provider to interact with our application, but a more thorough test would use a local mainnet fork and impersonate actual users.