Project Agent 0xTitans Dev Notes #3 - Matchbox DAO

Welcome to the latest installment of 0xTitans Dev Notes' - your insider's guide to the cutting-edge developments in our ambitious project. In Dev Notes #3, we are excited to share our groundbreaking progress: from our first successful attempts at using code models to generate a complete Solidity strategy, to our recent transition to Together.AI, an inference cloud renowned for its speed and reliability.

Exploring Together.AI

In previous editions, we discussed several platforms and technical setups used for testing models and evaluating their performance in the various tasks required for our agent system. To interact with open-source models, similar to how you might use ChatGPT, options like Hugging Face and RunPod offer simple configurations and more. However, we recently discovered Together.AI, an exceptional product that provides highly performant and reliable inference endpoints for many leading open-source models. Together.AI claims to be the fastest in the market, setting a new benchmark in efficiency.

https://x.com/NVIDIADC/status/1724562358910210487?s=20

To learn more about Together.AI, please visit their blog here.

Advancements in Generating 0xMonaco Solidity Strategy

After extensive experimentation across different platforms, models, and methods, we've achieved successful and reproducible results in generating full Solidity strategies for 0xMonaco with varying degrees of complexity.

Practice-Based Conclusions:

Optimal Model Selection: Our experiments show that Phind-CodeLlama-34B-v2.offers an ideal blend of strong reasoning capabilities, high-quality Solidity generation, and an efficient performance-cost ratio.
Sensitivity to Prompt Structure: We observed that models are highly sensitive to the structure of prompts. Even slightly different structures can yield varied results and affect the reliability of responses.
Importance of Concise Contexts: For effective code generation, the prompt and its related context must be meticulously crafted. We have adopted an API/SDK approach for this, avoiding the use of large blocks of explicit code.
Impact of Context and Prompt Size: The size of the context and the overall prompt significantly influences how the model weights different parts of the prompt. Practically, this means that vital parts of your prompt might be overlooked or given the same weight as less significant parts.

Prompt Structure

Role Definition and General Game Context

System Prompt: You are a Solidity Engineer AI. You have designed and written a game called 0xMonaco, written in solidity that allows 3 solidity ICar strategies to compete against each other in a turn by turn race, each strategy starts with 17500 coins. The first strategy reaching y = 1000 wins.

In order to get there, each strategy has to buy actions, each of which have a cost, and the cost goes up the more they are bought. The cost function follows a Variable Rate Gradual Dutch Auction formula, which is essentially

a logarithmic cost the more they are bought.

The actions are:

acceleration: each purchase allows you to move faster. You can buy 1 to N acceleration each turn.

shell: it will cancel all accelerations for the strategy / ICar in front of the current player

super-shell: it will cancel all accelerations for the strategy / ICar in front of the current player, all the way to the first player.

banana: it will stay where the strategy / ICar has dropped it and the next strategy / ICar running into it will have its speed cut in half.

shield: it will prevent the current player from being hit by a banana, a shell or a super shell.

SDK

The SDK for checking the cost of buying actions and for buying actions is as follows:

You can get the cost for each action by using the following methods:

monaco.getShellCost(1) to get the cost for 1 shell

monaco.getSuperShellCost(1) to get the cost for 1 super shell

monaco.getBananaCost() to get the cost for 1 banana

monaco.getShieldCost(1) to get the cost for 1 shield

monaco.getAccelerateCost(N) to get the cost for N accelerations

You can buy each action by using the following methods:

monaco.buyShell(1) to buy 1 shell

monaco.buySuperShell(1) to buy 1 super shell

monaco.buyBanana() to buy 1 banana

monaco.buyShield(1) to buy 1 shield

monaco.buyAcceleration(N) to buy N accelerations

Note that due to the cost function, buying 5 accelerations is exponentially more expensive than buying 1. You can get the cost for 5 or 10 to check the price impact if needed.

Each strategy / car has a data structure called CarData, described as follows:

struct CarData {

uint256 y; // Current position

uint256 speed; // Current speed

uint256 balance; // Current balance (coins)

}

Alignment with the user and agent system

Your task is to write strategies / ICar solidity contracts following the user request. For example, if the user asks for an aggressive strategy, the strategy / ICar solidity contract should favor buying shells / super-shells. If the user asks for a balanced strategy, the strategy / ICar solidity contract should promote a healthy balance between buying accelerations, setting a shield, dropping bananas when there are players behind, and sending shells when the cost is affordable and the player is second, or super-shells when the cost is affordable and the player is last.

Provide the solidity code implementation for the strategy, wrapped around backticks code block, and explain your reasoning step by step.

Make sure to extract the variables from the user request so you can reuse them appropriately within the code. The code you will produce will be ran in a docker container and you will get the output from it, so you can fix any error in the code. When that happens, fix the errors and send a new version of the code. Only send the full code back wrapped in a backtick-ed code block so it can be ran again so we can repeat the fixing if needed.

Enhancing Accuracy

Only provide complete and working code. Do not use comments to suggest some code should be manually added. The user will not add code. Only you will provide code.

The Strategy should have 2 methods:

takeYourTurn: allows the strategy to play a turn. The strategy receives the CarData for every strategy.

sayMyName: will return a string that is the strategy name.

The takeYourTurn method will be called once per turn for each car, car by car, until one car crosses the y=1000 finish line. At each turn, you get your position via CarData.y, your speed by CarData.speed, and your balance via CarData.balance. You will need to do the following:

Check the current race conditions // Structuring the process for reliable and user aligned code generation

Depending on the conditions, and the strategy parameters (focus on speed, economic balance, aggressiveness, banana-focus, defender, etc...) you will check the cost of the items you think should be used to get an edge over the other 2 strategies

Check the remaining balance, and buy the items using the buy functions described as follows:

Verify that you have completed the actions you want the strategy to complete for this turn.

Optional Code Context

Here the code for the ICar interface: ${ICar}

Here an example of a strategy: ${strategy1} … ${strategy2}… // more on that later

User Input

User Message: "Provide the complete code for an implementation of a 0xMonaco Strategy game that <User Input>. PROVIDE THE FULL IMPLEMENTATION CODE, well tested and thoroughly designed. Do NOT provide a sample implementation!"

Assistant:

Prompt Breakdown

Role Definition and General Game Context

First, we align and define the role for the model we are using by assigning the Solidity AI Engineer role, while providing some general context about the racing game.

We are also providing the full game mechanics of the actions to set the groundwork for the model to “reason” and understand the game implications of the code design.

SDK Approach

The SDK approach replaces the need to use the complete Solidity implementation of "0xMonaco," which is quite extensive (about 1000 lines of code). This approach provides specific functions to be included in the car strategy, focusing on the core game mechanics.

Types of Functions in the SDK

The SDK includes two main types of functions:
1. Cost Functions (Read Functions): These allow users to check the price of various actions without changing the game state.
2. Buy Functions (Write Functions): These functions alter the game state and execute related logic, such as affecting other cars and managing the player's balance.

In addition there are the CarData structure which gives the contract/car the ability to read the state of all the cars in the game(Position, Speed, Balance)

Considerations for SDK refinement

Using the full contract resulted in lower quality code, indicating that more context doesn't necessarily mean better results. Optimizing the SDK might involve including code snippets to demonstrate how the buy and cost functions work. This can be generalized to all actions, reducing size and potentially improving the context quality for the model.

Alignment with the user and agent system

We are ensuring that the user's intent is represented in the strategy, making sure it also reflects the game mechanics. Essentially, we are connecting and aligning the previously given context with the actual intent of the user. Additionally, we are providing a format to be used for testing the strategy and communicating with other agents in the system.

Enhancing Accuracy

The reliability and accuracy of the prompt process are highly important.

First, we aim to provide the model with a precise method to structure consistent reasoning processes that conclude in strategy generation.
To evaluate the model's ability to produce code, we need to ensure that the "recipe" remains the same, regardless of user intent.
Code generation within high-context games such as Monaco is highly delicate. Therefore, we need to ensure that we are actually producing code that is not only correct but also makes sense within the game mechanics in a reliable and predictable way.

Generating The Solidity Code

Findings

Code Accuracy: We define code accuracy as having valid Solidity code that compiles without errors and adheres to the ICar interface. Additional qualitative properties include whether the code is written according to standard design patterns of Solidity contracts and does not include potential issues like overflow.

We found most of the code to be of very high accuracy. However, in some rare cases, the code produced did not accurately reflect the unique nature of Solidity types.

Strategic Reasoning: Assessing strategic reasoning is more challenging. It mainly reflects how clever the generated strategy is based solely on context, without strategy references or detailed user descriptions (like pseudocode). We obtained promising results, but we haven’t yet achieved the same level of performance as the highest human-generated strategies. This is primarily due to the model's limited mathematical comprehension of how pricing works in the game.

Aided Generation: With aided generation, we include very high-quality strategies as a reference source for the model.

Here an example of a strategy: ${strategy1} … ${strategy2}…

In this case, we were able to generate high-quality strategies that take into consideration the mathematical nature of the game.

Alignment with User Intent

We found that the strategies do represent user intent and are able to demonstrate it through the code and the "rationale" provided at the end of the prompt.

Prompt and Parameters Hacking

Different user messages triggered different qualities of strategies, which we find valuable in terms of user experience because we do not want the model to come up with a strategy on its own.

In terms of parameters used in the context of car strategies, lower temperatures (such as 0.2) yield consistent but simpler strategies. Higher temperatures result in strategies with highly creative or strong reasoning parts, "sparks" of reasoning that resemble top-performing human-generated cars. A temperature of 0.5 seems to be the sweet spot for moderately clever strategies that are eligible.

Setup

As we stated we are using Phind-CodeLlama-34B-v2 with Toghter.AI APIYou can check the full script file here:

https://gist.github.com/0xNB-dev/194cee3fc8618c8365364d2f118d77cd

Examples

Case #1 Producing Aggressive Strategy

User message:

Provide the complete code for implementing a 0xMonaco Strategy that becomes very aggressive Use complex patterns while maintaing budget

      "model": "Phind/Phind-CodeLlama-34B-v2",
      "max_tokens": 16384,
      "prompt": prompt,
      "request_type": "language-model-inference",
      "temperature": 0.8,
      "top_k": 20,
      "repetition_penalty": 0,

There are a few reasons why we find this strategy to be impressive:

Prioritizing aggressiveness according to user intent. Initially, the aggressive sequence is activated by calling the useAggressiveActions function.
The strategy represents user intent to manage the budget wisely in two ways:
- by setting a maximum cost for shells and super shells to prevent overspending, which is critical in the game.
- by stopping spending if the budget reaches a very low amount

Case #2 Producing Aggressive Strategy for the end-game

User message:

rovide the complete code for an implementation of a 0xMonaco Strategy that is very aggressive towards the end of the race

      "model": "Phind/Phind-CodeLlama-34B-v2",
      "max_tokens": 16384,
      "prompt": prompt,
      "request_type": "language-model-inference",
      "temperature": 0.5,
      "top_k": 20,
      "repetition_penalty": 0,

clever use of end-game strategy

This code snippet, taken from the full contract, demonstrates one of the most common patterns in Monaco: using aggressive behavior towards the end. The snippet represents good reasoning in several ways:

The shells are used correctly based on their index, and when the car is in third or second place (one error is starting from 1 instead of 0, but that’s an easy fix).
Additionally, the usage of bananas is only triggered close to the end (y>750).

Case #3 Producing Balanced Economic Strategy for the end-game

User message:

Provide the complete code for an implementation of a 0xMonaco Strategy that is highly economic set up max price for all actions while taking into considerations game phase(early, mid or end game)

      "model": "Phind/Phind-CodeLlama-34B-v2",
      "max_tokens": 16384,
      "prompt": prompt,
      "request_type": "language-model-inference",
      "temperature": 0.5,
      "top_k": 20,
      "repetition_penalty": 0,

This snippet demonstrates the model's ability to translate a user's vague intent into actual clever code. It sets up maximum prices in relation to the initial budget (17,500 coins), without access to highly specific pricing behavior. Generally, for each action, there is a target price and a constant price that remains the same as long as the buy rate is 'X' units per turn for that action. For example, the target price for shells is 300 coins when the purchase rate is 3 shells per turn.

Case #4 Structuring Strategy Generation Via Parameters

Aggression factor set to 10

One of the key challenges with end-user facing Gen-AI applications is the interaction. We aim to guide our users to produce a consistent strategy, offering room for experimentation while also providing a safe method to create many different strategies that share the same structure and are easy to iterate on and test.

That's why we are also considering a parameter-based approach, in which the user provides parameters for certain properties, such as aggressiveness.

Dota 2

Case #5 Using Aided Generation

In this example, we included in the prompt examples for performing strategies. The model was capable of catching up with these strategies and generating similar quality strategies easily. You can check this highly detailed examples (~300 lines of code) here.

https://gist.github.com/0xNB-dev/5798fd9783647397a86d1e28a5f3a9b9

Current Understanding and Experimentation With Different Tools, Frameworks, Models, and Platforms

Frameworks

Frameworks provide structured environments for building and deploying AI models. They often include pre-built components and support for various programming languages.
- Python Predominant: Offers massive support for AI development.
  - SuperAGI: Geared towards automating tasks like scraping and summarizing. It's similar to many agent frameworks.
  - Autogen: Excellent for parameter optimization, particularly effective with OpenAI.
  - AGiXT: A promising orchestration framework supporting Python and TypeScript.
- JavaScript / TypeScript Growth: Increasingly significant for in-browser work, with rapid development potential.
  - TransformersJS: Primarily for running inference from the browser.
  - ModelFusion: A powerful framework with diverse capabilities.
  - Chidori: Offers great potential with Rust support.

Inference

Run on Your Own Server:
- Infrastructure:
  - Banana.dev + Potassium: Platforms for running and managing AI models.
  - RunPod: A service for hosting and running AI models.
  - e2b.dev: Focuses on sandboxed agents.
- Inference Engines:
  - vLLM: User-friendly with fast inference but limited quantization format support.
  - Ollama: Supports a wide range of models with easy-to-use interfaces.
  - TogetherAI: Claims to be the fastest but is not open-source yet.
Serverless Deployment:
- Examples include Banana.dev and Fireworks.ai, where you pay only for inference time.
Hosted / Shared Services:
- Services like NLPcloud and TogetherAI offer good value for money with smooth operation and good model support.

Models

Coding Models: Code-Llama 34B and Phind Code-Llama 34B are top choices, followed by WizardCoder, StarCoder, and Mistral.
Reasoning & Math: Models like Arithmo-Mistral and MAmmoTH-Coder-34B specialize in these areas.
Small Models: Phi1.5 and TinyLlama are designed for lightweight applications.
General Purpose Models: Include MistralLite and Mistral 3B / 7B for various applications.

Tips for Building Agents

Prompt engineering is crucial. Experiment with different approaches, contexts, and explanations.
Tweak model parameters, with temperature being most important, followed by top_p and top_k.
Be mindful of the prompting style, as it can yield significantly different results.
For execution permissions, like with Autogen, use Docker for isolation.
Prefer instruct or chat models for better interaction with other agents.
For custom code execution, choose less opinionated frameworks like ModelFusion or Chidori over more specialized ones like Autogen.
Refer to curated lists like this one on GitHub for comprehensive resources on AI agents.

https://github.com/e2b-dev/awesome-ai-agents

If you are working on LLM applications, want to learn more, or have an interesting use-case for Web3 please reach us out via

MatchboxDAO

Nate Product lead and Research

Nick Tech lead

For the previous notes check:

https://mirror.xyz/matchboxdao.eth/cX3hoV0DJC6VHhm0Mi8ySdY9K_MrJ093RvaBHT1xnts

nft://undefined/undefined/undefined?showBuying=true&showMeta=true