Will McTighe

Posted on Jan 29, 2022Read on Mirror.xyz

Tutela — Tornado Cash Pool Anonymity Set Auditor

In late October, in response to the Tornado Cash (TC) Anonymity Research Tools Grant, we began building Tutela, an anonymity detection tool, to help Ethereum users to check:

1.) if their Ethereum addresses can be linked; and

2.) if their Tornado Cash transactions are compromised.

Since our last update 8 weeks ago, we’ve received incredibly helpful user feedback and have added functionality like ENS searching and showing Tornado Cash interactions for all input addresses! If you have feedback, we’d love to hear about it in the Tutela Discord Channel.

Our latest tool is the Tornado Cash Anonymity Pool Auditor. It allows you to see a more accurate anonymity set of each Tornado Cash Pool than the headline numbers. Before diving into it, a quick recap on blockchain privacy, Tornado Cash and TC Pool Anonymity Sets.

Blockchain Privacy

Currently, there is very little transaction privacy for Ethereum users. On Etherscan, we can clearly see the value of transactions, their contents and the sender/receiver. For greater adoption this has to change — it is not acceptable in Web2 for others to be able to see our salaries, online purchase history or charitable donations! Similarly, businesses don’t want you to know who their suppliers are and how much they are being paid.

Tornado Cash

Tornado Cash is a mixer protocol that helps Ethereum users have some transaction privacy by breaking the connection between two addresses. It is called a mixer because you mix your funds with those of others.

Federico, Founder of LambdaClass and a collaborator on Tutela, provides an excellent summary of how Tornado Cash works here. I will provide a brief recap:

1. In all Tornado Cash Pools apart from Nova, you can deposit a fixed amount of ETH in their pools (e.g., 1 ETH, 10 ETH, 100 ETH). With that deposit, you receive a note, which you can later use to withdraw your deposit to any address.

2. Your anonymity is defined by the number of equal user deposits in a given pool. This is the Anonymity Set. In the example above, D’s withdrawal could have come from A, B or C, so the anonymity set is 3 and the probability of correctly guessing the deposit / withdrawal connection is 1/3*.

3. The more people that deposit in the pool, the greater the number of people that a withdrawal could have come from. If you add a 4th deposit above, the probability of being correctly detected decreases to 1/4.

4. However, there are lots of ways users can compromise their privacy. If you can link A’s deposit to E’s withdrawal, then the pool’s anonymity set decreases from 3 to 2. This means the probability of correctly guessing your deposit / withdrawal connection increases to 1/2 because any withdrawal could only have come from B or C’s deposits.

5. Federico’s team at Lambda School and István have codified 5 ways that Tornado Cash users can misuse the protocol to link their deposits and withdrawals and reduce the anonymity sets of Tornado Cash pools. More on them on below.

* For the nerdy details, this is simple combinatorics. The number of possible combinations in this example is 3C1 = 3. It assumes only 1 deposit and withdrawal per entity. Combinatorics is also why you should withdraw to multiple addresses — it meaningfully increases the set of possible deposit / withdrawal combinations and improves your privacy!

Five Tornado Cash Reveals

Remaining anonymous using Tornado Cash requires you to not make obvious mistakes that could reveal yourself. Here are 5 that could reduce your anonymity:

  1. Address Match Reveal — Reuse of Deposit Address for Withdrawal

If a user deposits from an address and that address withdraws from the same pool, this deposit and withdrawal are assumed to be linked. There are cases when this may not be true but in general, these users are likely TORN yield farmers who do not care about privacy. We believe this heuristic is relatively deterministic.

2. Unique Gas Price Reveal

This involves using the same unique gas price for a deposit and a withdrawal to different addresses. Many wallets like Metamask have gas price recommendation systems. However, if the user manually sets the amount of gas to pay, that amount will remain the default price that the wallet will use for other transactions, irrespective of the address used. Given this heuristic maps a unique deposit to a unique withdrawal, we believe it is deterministic.

3. Linked Address Reveal — Transactions outside Tornado Cash

This heuristic looks at all the interactions between deposit and withdrawal addresses outside of Tornado Cash. If addresses have interacted more than 3 times, they are assumed to be owned by the same entity. This heuristic is probabilistic and can produce false positives, so we added the >3 interactions constraint to limit these. More interactions increases the likelihood that addresses are linked.

4. Multi-Denomination Reveal

If your deposit address mixes a specific set of denominations and your withdrawal address withdraws them all (e.g. if you mix 1x 10 ETH, 1x 1 ETH, 1x 0.1 ETH in order to get 11.1 ETH), then you could reveal yourself if no other wallet has mixed this exact denomination set. This is a probabilistic heuristic given multiple addresses have deposited and withdrawn the same combinations.

5. TORN Mining Reveal — Careless Usage of Tornado Cash Anonymity Mining

Anonymity mining was an incentive scheme to increase the anonymity set in TC Pools (number of deposits). TC rewarded participants a fixed amount of anonymity points (AP) based on how long they left their assets in a pool.

After withdrawing assets, users can claim Anonymity Points. The amount withdrawn is recorded in the transaction. If a user uses an address to claim all of their anonymity points, you can calculate the exact amount of time their assets were in the pool and then potentially link their deposit and withdrawal addresses.

Tornado Cash Anonymity Set Auditor

Overview

The Tornado Cash Anonymity Set Auditor, computes the above heuristics for each Tornado Cash pool to determine how many potentially compromised deposits there are in each pool.

Try this yourself, by simply selecting the Tornado Cash Pool you are interested in from the drop down list on Tutela’s search page.

Results

Using the 10 ETH pool as an example, you’ll see that the Tornado Cash app, shows 28,188 equal user deposits as of late December.

Searching the 10 ETH pool address in Tutela returns that 8,060 deposits are potentially compromised by the five reveals above. The discrepancy in the number of equal user deposits is because our current dataset is from October. We will be introducing live updates shortly.

Worried you’ve compromised yourself on-chain?

There are two ways to check this:

1. Ethereum addresses — input your address / ENS on Tutela and it will show ethereum addresses clustered using the ethereum deposit address reuse reveal and your Tornado Cash deposits, withdrawals and reveals — shown like the below.

2. Tornado Cash Pools — input your deposit/withdrawal address at the bottom of the Tornado Cash Pool Address results page (shown two images above) to see if it has been linked in that pool. We don’t make these compromised transactions public without searching to protect the identities of Tornado Cash Users.

Worried about your privacy? We don’t store your IP addresses (use a VPN anyway please!) or have any permanent search storage. Check for yourself, our code base is publicly available here.

What Next? Machine Learning and Tx Reveal Data

Our next article will provide details of our application of Diff2Vec, a ML algorithm, to the entire set of Ethereum transactions. This also clusters Ethereum addresses and can help everyday users to understand what the likes of Chainalysis can find out about them. Excitingly, this may be the first public application of Diff2Vec at scale!

We recently built out functionality to display Ethereum transaction data to show you when you revealed yourself and a live data feed, so will post on this next time!

Project Contributors:

- Will McTighe, a Stanford MBA, is managing this team effort.

- Mike Wu, a Stanford PhD in AI, is leading the clustering and ML analysis.

- Kaili Wang, a 4th year computer science major at Stanford, is leading front-end development.

- Dr. Nick Bax, a Stanford PhD graduate who has traced funds related to several hacks and recently published on tracing the WannaCry 2.0 malware Monero transactions. Nick leads the identification of heuristics.

- István A. Seres, an applied mathematician, leads defining heuristics and the research part of the project.

- Federico Carrone, Founder of LambdaClass, is in charge of a team of computer scientists, computer engineers and data scientists (mathematicians, physicists, engineers) who work on Zero Knowledge proof cryptography.

- Tomas De Mattey, a UNTreF Grad, project manages the Lambda team.

- Manuel Puebla, a UBA Mathematics grad, supports the Tornado Cash heuristics research.- Herman Obst Demaestri, a UBA engineer, leads Tornado Cash heuristics development.

- Mariano Nicolini, a UBA physics grad, supports Tornado Cash heuristics development.

- Pedro Fontana, a UBA Mathematics grad, supports Tornado Cash heuristics development.