Kain.eth

Posted on Nov 29, 2023Read on Mirror.xyz

Frontrunning Synthetix: a history (Updated)

Intro
This post was originally published on April 15th 2021, on the Synthetix blog. I am updating it, making minor edits for clarity, as well as inserting numerous parenthetical comments making fun of myself; I will use curly braces {} to indicate these comments for disambiguation from the parentheses in the original post, you can find the original post here to compare the two if you are so inclined.

April 15th, 2021
As we embark on two new approaches {you can just taste the naive optimism dripping off this sentence} to limit frontrunning, I thought it would be valuable to document the history of frontrunning Synthetix. Frontrunning is a thread that has managed to weave its way through our entire journey. Hopefully, understanding how this long-standing challenge has been approached by the Synthetix community will provide insights for anyone building in low-information environments like DeFi {it didn’t}.

First, a quick recap of what frontrunning is with respect to Synthetix. Synthetix relies on Chainlink Oracles {less so in 2023, but more on this at the end of the post} to update prices to determine the exchange rates for each Synth. Due to constraints on Ethereum, it has been possible to trade before an oracle update to produce risk-free profit {boy, howdy, hasn’t it just}. Frontrunning protection attempts to reduce the likelihood of these kinds of trades {and fails}.

The first of the two recently proposed solutions to frontrunning came from Andre Cronje, you may have heard of him {No, no, not the Yearn guy, he’s the leader of Fantom now, plot twist}. This SIP creates an additional exchange function designed for high-value cross-asset swaps by adding TWAP Oracles to the Chainlink Oracles in order to detect high volatility. The standard exchange function using Chainlink oracles is retained for the majority of trades {in 2023, approximately 100% of exchanges happen inside the Perps module of Synthetix, RIP the atomic swamp we hardly knew ye}. The SIP aims to bypass fee reclamation, the current mechanism {mechanism is a generous term here, cudgel is more apt} to prevent frontrunning Synthetix. The second pathway to solving frontrunning is the implementation of exchanges on OΞ {I tried to meme this symbol into existence, now we just say Optimism} (an L2 scaling solution) {Can you believe there was a time when I had to explain what Optimism was lol}, allowing Chainlink to significantly lower oracle latency {not enough sadly} due to the higher throughput on OΞ. Both of these strategies are close to being implemented, but it remains to be seen whether either of them will be successful {they weren’t}.

You must, to an extent, suspend disbelief when you embark upon a new project {Man, I wish I had read this before starting Infinex}. If you looked at the problem with a lucid perspective you would likely never start anything at all {again, big ooooof not having reread this post, I could have saved myself a lot of stress}. Almost anything worth doing is going to be more challenging than you could have imagined at the outset. The impetus to start at all necessitates a level of naivete. That said once you have started and begin to run into the hard problems, you must be prepared to deeply interrogate them to identify the best path forward. Because new things are highly path-dependent, an early decision can send you down a path that takes years to wend its way back to the optimal route. This is what happened with frontrunning and Synthetix. A single bad decision has reverberated down through the years, even today. Needless to say, had we realised the extent to which frontrunning would plague us, we may have discarded the original mechanism entirely. Sadly this Su Zhu article was about three years too late! {Plot twist, it turns out the source of toxic flow for every CeFi lender was actually Su Zhu and three arrows, they should have read his article before giving him billions in uncollateralised loans!}

To begin, let’s go back to late 2018, a time when many people felt like the crypto world was coming to an end {feels familiar}, in spite of this, the remaining members of the Synthetix community were preparing a last stand. This comprised a number of different strategies, but for the sake of this post, let’s focus on the pivot from a USD-denominated stablecoin payment network to a derivatives trading platform. While this change was hinted at {we yolo mentioned Nomins [don’t ask] denominated in AUD} in the original whitepaper published in 2017, the actual mechanism was not described. This was primarily due to the fact that we had no fucking clue how to implement it back then. In the latter part of 2018 I was fixated on solving a number of implementation challenges with respect to how to measure an on-chain fluctuating pool of synthetic assets we proposed to build. It was actually the solution to how to assign a portion of the global debt to a specific user and calculate this on-chain, that led us off the optimal path into a swamp {sadly, this swamp was not atomic} from which we have still not emerged {spoiler alert, we have now, shows what you know Charlie Noyes}. There was some pretty simple math involved, but a key input to this calculation was the global debt value. In order for this calculation to function, we needed to be able to read the amount and price of each Synth on-chain to get the global debt value. So one day in late 2018, I found myself getting pitched a scheme involving off-chain signatures from an oracle service that would provide real-time prices as needed for anyone trading Synths. The issue, there are actually a number of them, but the most critical one was that it would mean that unless a price was requested regularly the view of the global debt pool could and likely would become stale. This was a deal breaker {it wasn’t}, so we decided to go down the path of push oracles. {funny story, it actually turns out stale prices are not even that big of an issue, we later developed a mechanism that updates these values only a few times a day, and it was totally fine} This decision would come back to haunt us many many times {I am coining this DeFirony}. This is the story of that haunting. Cut to February 2019, a young lad named Jackson Chan had just joined the project. He had a few questions. The first was; “Are we not worried about people frontrunning our extremely centralised and proprietary oracle?” My response, after I finished laughing, was that we didn’t think it would be an issue as long as we could get the update latency low enough, as this would introduce some uncertainty for the frontrunner and shift their EV (expected value) towards zero, combined with the cost of capital of sitting in Synths waiting for a large price spike and we figured it was not likely to be an issue. LOL. I know I tell everyone all the time that I'm a complete idiot, I feel like sometimes they think I'm joking. I'm not. {I am still not joking, this is why I work so hard on decentralised governance; I should never be given unilateral power over anything} Please reread that paragraph until it sinks in. There is necessary naiveté and optimism, and then there is wilful ignorance. Embrace the former and reject the latter.

The fundamental issue was that L1 latency was already not ideal, and the cost of pushing prices often enough, even in the days when people used to input gas prices to two decimal places, was just not sustainable. The reason this had yet to be exploited was that it was just fundamentally not worth the effort; Synth liquidity was too low to allow for someone to take a 50 basis point edge and consistently extract it. The losses to slippage would have pushed them into negative profit. It was around mid-2019 when the community finally solved this issue by creating the incentivised sETH:ETH pool in Uniswap, you’re welcome, UNI whales. This increased liquidity significantly for Synths, of course, around this time, the first frontrunners emerged. They were not hugely profitable because they were still somewhat limited by overall synth liquidity, trading with tens of thousands rather than the millions that later frontrunners would utilise. {As I was rereading this and thinking about the recent and somewhat hilarious dYdX price manipulation attack, I started thinking it would probably not be that hard with today’s tools to track down all of these frontrunners, I’m not sure I would approve of this but it would be kind of funny, these people were after all, the original defi implementors of “highly profitable trading strategies”. They probably have a ton of money as well, given how OG they were. We sort of viewed this as a live bug bounty program, but they probably stole tens of millions of dollars from SNX stakers over the years. There should probably be some kind of DeFi statute of limitations…}

They actually had a big edge, though, which was that they quickly realised reading the oracle updates directly from the mempool would allow them to cut off a few blocks and increase their EV. This was an interesting time because almost all of the frontrunners were part of the SNX community to some extent, you basically couldn’t know about this trade unless you were deep in our discord. Some of these people went on to launch other projects, YAM, for example {We definitely should leave these guys alone, they have been punished enough}, some hung around and became guardians, and some went on to exploit other projects as frontrunning {Synthetix} became less profitable {I bet some of them started MEV bots as well the sneaky bastards}. One very special person lives on in infamy, Onyxrogue. In May 2019, Onyx wrote a bot that was reading from the mempool and executing fairly large transactions based on pretty simple but elegant logic, looking for the largest upcoming price deviation and trading into that. This logic was what allowed Onyx to exploit a failure of the {centralised} sKRW price feed to the tune of 11 billion sETH. Today that would be worth around 22 trillion dollars... {2 years later, it is still 22 trillion, lol god damn you, ETH price, do something}

I was woken one morning by JJ, ironically this was one of the few nights I had managed to get more than 4-5 hours sleep in months. “We have a problem,” he told me as I groggily realised it was 7 am. Oh, the many possible problems we could be having, I thought to myself... It was the equivalent of the Samczsun “U up?” message.

Shockingly enough, the situation was probably only in the 80th percentile of terribleness, especially given there were any number of scenarios from which no recovery would have been possible. Thankfully we were able to freeze the contracts before the sETH:ETH pool could be drained. But then the hostage negotiation began. Onyx reached out and requested a payment of IIRC 500 ETH, which at the time was around $100k USD, we ended up paying him 100 ETH for his cooperation. This involved reversing the trade that had netted him 11 billion sETH. Part of the reason he had not been able to cash out any of his gains was that he had actually not been aware of the issue as the bot was automated, maybe he was sleeping, who knows. By the time he realised, the network had already been halted. This definitely limited the extent of the damage. There were a number of discussions over the course of a few days, but the end result was that he made it very clear that he was planning to continue frontrunning the protocol even after we paid him the bounty. I wished him luck and told him that we had a number of measures we planned to implement to reduce the expected value of such activity {there is a lesson here, never turn up to a wheat thresher fight without a wheat thresher…}.

We then looked into a number of solutions, but at this time, the focus was mainly on attempting to punish frontrunning. My personal view was that if the expected value of frontrunning was never negative, then eventually, we would have hundreds of people attempting to attack the system. We were basically offering a free option to attack our oracle. In the end, we implemented a system to slash any address for which specific criteria were met, and Onyx was slashed. Even after this slashing, Onyx was still up significantly, and if he had strong hands is now sitting on almost $200k worth of ETH. He posted a thread on Reddit attacking us after the slashing, but unfortunately, he has since deleted it, the original thread and my response can be found here and here. The key takeaways from that post are below.

“Specifically what happened was the oracle detected a tx in the mempool trying to front run a price update. It then implemented a sandwich attack to raise the exchange fee to 99% for that transaction by sending one tx with higher gwei to raise it and another with lower gwei to drop it back down to the normal rate. Here is the transaction that slashed his funds by 99% and sent them to the feepool to be distributed back to SNX stakers.”

Needless to say, this was one of the most stressful times for the Synthetix community and was a huge distraction from actually building the protocol.

Some good came out of it, however, which was that it forced us to make a decision to migrate away from our own proprietary and centralised oracle to Chainlink. In hindsight, this is one of the best decisions we made at the time, and even though the transition process took almost nine months, by the time it was done the project was in a much better place {fun fact we actually convinced Chainlink to deploy a push oracle rather than a pull oracle they had been designing, RIP everyone basically, that said push oracles are basically better for all of DeFi except leveraged perps so I guess it all worked out in the end}.

However, we were still playing whack-a-mole with frontrunners. We implemented a scheme to limit the maximum gas price that a transaction could use, and while this was somewhat effective the UX was a nightmare {Think about just how impressive you must be at mechanism design to build something that qualifies as a “nightmare” in DeFi!}. And even after this change, frontrunning was still possible. But in late 2019 JJ and I devised a mechanism that would later be called fee reclamation, which forced a trader to wait a set amount of time before the trade price would be confirmed. While the UX was not ideal {This is an understatement}, this gave us a lever to reduce frontrunning to essentially zero, but even that process took almost a year from when fee reclamation was released in early 2020, at which point Kaleb showed up and forced us all to get serious about understanding the frontrunning data and simulating FR attacks. As an aside, to see just how much time and effort FR defence took up, you need only look at sips.synthetix.io, of the first 50 sips, more than 25% deal with some aspect of frontrunning {I am not going to bother updating this stat because there are now like 400 SIPs and I am lazy}.

It is also probably worth making a distinction between the different types of frontrunning, Onyx was employing mempool frontrunning, looking for an oracle update in the mempool and then submitting a transaction with higher gas to exploit it {today we would call this MEV, which for a long time, and I swear this is true a lot of people in the Ethereum community including myself said was not a real thing because it hadn’t happened in the wild, you probably see a parallel above to my argument with Jackson. Let’s just say the absence of evidence is not evidence of absence and leave it at that.}. Fee reclamation eliminated this, but frontrunners later developed a method called “soft frontrunning” where they attempt to infer the future oracle updates from off-chain data. While fee reclamation handles this fairly well it is not perfect and soft frontrunners have been able to exploit oracle updates at various times based on the volatility of the underlying asset and the fees charged for each trade. Due to the pervasiveness of soft frontrunning through late 2020, fees were often raised, and the fee reclamation window lengthened to counteract this. Obviously, this is a poor solution from a UX perspective, as the majority of traders end up with much worse pricing. This is why we have continued to iterate and develop new approaches. {three years later we finally nailed it…}

I have said many times that Synthetix has a culture of iterative experimentation, this is based on the belief among many people in the early community that empirical information is far more valuable than theorisation. That said, over time, we have become more rigorous in our approach and combined this empirical data with robust modelling as new people joined and patiently explained to us that constant iteration in the dark was somewhat suboptimal 🤷‍♂️. But it is important not to forget that while frontrunning has been ever-present background noise for the project, there have been many many other challenges that needed to be surmounted for the project to be where it is today. So while taken in isolation, our approach to frontrunning may appear less than rigorous, it was actually optimising for highly constrained bandwidth. Would I do things differently today, maybe, but had we spent the time and effort to exhaustively explore the solution space for frontrunning in 2019, we may well have not implemented inflationary incentives which kicked off the multiyear SNX bull run that has given us the resources to now develop the very rigour we couldn’t afford a few years ago. {this feels like max cope, honestly, we totally should have figured this out sooner}

We are closer than ever to resolving frontrunning, yet we cannot be certain that new attack vectors will not be discovered and exploited in these solutions. So let's return to the present and the potential solutions mentioned at the start of this post. Adding TWAP oracles to the existing Chainlink oracles creates a fairly harsh trade-off, but a very clever one. {almost too clever, you might say} They allow a trader to get the worst price from multiple oracles, what this resolves to is essentially volatility protection which is when the oracles are most at risk of frontrunning. Essentially they allow you to get very close to the normal Synthetix price from Chainlink when volatility is low, but as volatility increases, the fill will become worse, if volatility is too high the trade will revert. This is a powerful disincentive for toxic flow like frontrunning, the issue is it does mean that a trader that wants to make a large trade while prices are rapidly moving will need to use another venue most likely, but this is pretty well understood for people who make large block trades. If I turn up to the Kraken OTC desk in the middle of a huge red hourly candle and ask for a quote on $10m of BTC, the spread is going to be much wider than if I was trying to trade on a day where the markets haven’t moved at all. This is just a very elegant way of replicating this kind of volatility-based spread increase.

The second path to thwart frontrunners is the migration to Optimism, this will allow for us to optimise for a critical ratio. The ratio between oracle deviation threshold updates and fees. If the deviation threshold for an oracle update is say, 2% and fees are .5%, then a frontrunner can easily watch the market wait for the price to have diverged by more than fees and trade with the assumption that the next price update will be profitable for them. This is not perfect, of course, but if the frontrunner targets their trades accurately they can amass profits very rapidly. Conversely, if the price oracle updates every .1% (10 bps) and the fees are 1% (100 bps) then it will be very, very unlikely that a frontrunner will be able to exploit a price change without incurring more fees than profit. {those pesky front runners still managed to do it somehow, though! Then we added leverage…} So the tension here is between higher fees and lower latency. The opportunity is to work with Chainlink to significantly reduce oracle latency on Optimism while not incurring the fees associated with L1 and the latency of 13-second block times {This was not quite the opportunity I had hoped, in hindsight}. We have been coordinating between the three projects to reduce this latency and deviation threshold, and we are confident we can get it low enough eventually to allow for both low fees and a high enough ratio between fees and deviations to reduce the chance of frontrunning significantly. {We couldn’t}

It may well be that the concept of infinite liquidity that Synthetix has stuck with for so long has created numerous issues, and may have led to a situation where we were optimising for the wrong variable. {Obviously, lol} Ultimately, traders want the best possible execution. If you exceed this you are theoretically leaving money on the table. However, network effects are powerful here, if you do not have significantly better execution than alternative venues, you will not overcome the switching costs for traders to move from their current trading platform. So infinite liquidity offered a powerful meme to intrigue traders, but it may have reached the point where easing back on this restriction is actually better in the longer term {it had actually happened years before, but oh well}, and offering something more like best execution where Synthetix, on average, provides far better execution, but without the requirement to provide infinite liquidity. In a way, the TWAP exchanges are the first step in this direction. I hope it is possible to ensure that on L2, infinite liquidity is still achievable. {It wasn’t} But if not, then finding a new approach that increases the cost of a fill based on the directionality of trades over a certain time frame could be a path forward. {It was} This would ensure the impact of toxic flow from whatever the source; frontrunning, oracle manipulation or asymmetric information was significantly reduced.

“Ok, great”, you are saying; “I just slogged through 2500+ words, where is the payoff?” Looking forward, we can examine what the project will look like if these two solutions work as expected. {Arguably, Optimism worked better than expected it just didn’t solve this particular problem, unfortunately} Firstly, it is hard to appreciate just how much of an improvement Optimism will be for Synthetix. {This was true} Much of the Synthetix volume these days is coming from L1 activities like cross-asset swaps and other composability-derived volumes. {RIP L1 fees now…} Optimism will open up a new era for the project where gas costs are negligible, and we can finally launch leverage trading with perpetual futures. {Wow, I predicted something correctly!} This is likely to drive volumes parabolic as more people migrate to Optimism and have ready access to liquidity on Kwenta {I guess I used to be a Kwenta maxi, who knew?}. Meanwhile, the volume flowing through cross-asset swaps {I am literally tempted to just delete this entire nonsense section that follows, but that would rob you of the chance to laugh at me}, while promising, is insignificant compared to what it will look like if aggregators like 1inch integrate this new TWAP mechanism to route large orders through Curve + Synthetix. It is not hard to imagine volume increasing by 10-100x. This is because, to date, the aggregators have refused to integrate transaction routes that are not atomic. And while I can appreciate them taking a stand on this it has significantly reduced the utilisation of the curve cross asset swaps to date. SIP-120 removes this restriction, and we should see that for almost any order over $500k on 1inch most of it will be routed through Synthetix. {Damn you, Uniswap 1bps pools!} This is a significant threat to OTC desks and other block trading venues. While improving L1 UX is important, the potential combination of Optimism scalability {crazily enough, it was even more scalable than we realised, and it turns out that it was more than just a simple scaling network, it is actually a network of networks with all kinds of amazing properties. Needless to say, the investment in Optimism has been an incredible decision} and faster Chainlink Oracles is by far the most exciting advancement in the protocol in the last few years. The lower latency from Chainlink price feeds will allow for even tighter spreads, and therefore an even larger percentage of volume will be routed through Synthetix by aggregators. This will take time {I guess we can all keep waiting, maybe one day it will happen} as it requires a lot more liquidity to migrate to Optimism over the next 6-12 months. But it is undoubtedly the direction things are heading.

Taking a single aspect of the Synthetix protocol in isolation can lead to a skewed conclusion potentially, but the intention was to demonstrate how fine the balance is between optimising for rapid iteration versus deep exploration of problems. It is certainly possible that deep exploration of the problem would have yielded one or more of the solutions we see today much earlier, but it is likely impossible to know. Building in DeFi is challenging because so much uncertainty exists at every layer, from the base infrastructure up to the interfaces. This uncertainty compounds, and taking the epistemologically naive approach of just building shit until something works is very compelling, but it is likely that a hybrid approach based on combining empirical information and research is optimal. But isn’t a random walk through the solution space so much more fun? {Honestly, I kind of still believe this, and Infinex, we’ve taken a similar approach, though with hopefully more rigour} You never know what crazy mechanism you will stumble upon.

UPDATE: November 2023.

Cut to today. Over the intervening 2.5 years, it has become progressively more obvious that the pull oracle solution was the correct approach the whole time. Apologies to the many people who told us (me) that infinite liquidity push oracles were madness. Today we barely see any spot trading let alone cross asset swap trading. I do feel validated that the decision to go all in on Optimism has paid off because which pull oracles can work on mainnet, the user experience is not ideal for a typical trading use case. We spent a lot of time trying to continually optimise the push oracles on Optimism, but ultimately, we had to capitulate, I am thankful to Chainlink for spending so much time and effort on this Sisyphean task, but ultimately we all had to capitulate and accept that the underlying mechanism was the problem rather than our inability to further optimise latence.

What is a little wild is it actually took Pyth codeveloping a novel pull oracle solution with a few rogue Synthetix engineers to finally shift us away from push oracles. Yet even today, the legacy implementation still relies on push oracles. Thankfully, we are about to detach from this legacy tech once and for all; Synthetix Andromeda, which will be launched on Base in a few weeks, has no reliance on legacy oracles and is completely redesigned to only require pull oracles. Andromeda, at launch, will be solely powered by Pyth. That said, the pull oracle interface is generic and can accommodate any new pull oracle solution that eventually comes to market. This will ensure, as block times get lower and latency is reduced, oracle responsiveness continues to improve and trade execution latency approaches that of centralised exchanges.

Synthetix Andromeda will include the best-designed perpetual mechanism ever built (we know this definitively because no one has had time to copy it yet 🤣), it took us a long time and a lot of iteration to get here, but in the next few weeks, we will see a new era of Synthetix ushered in. The fact that all of this is happening during the early days of the Optimism Superchain and the launch of Infinex, which will bring DeFi UX up to the standard of CeFi, is even more exciting.