Matchbox DAO

Posted on Nov 08, 2023Read on Mirror.xyz

Project Agent 0xTitans Dev Notes #2

Last week, we covered some of the general questions and frameworks integration we are working on. Today we will dive deeper into some concrete examples regarding model performance, our hardware setups, and a few successful attempts we had.

First Steps With Retrieval Techniques

As part of our agent system, some of the skills required include context agents or data agents, which are performing exceptionally well with the different resources we have.

We are currently considering two main use cases within our system:

In previous notes, we mentioned that strategy generation is a multi-step problem, with some of the steps requiring agents that specialize in recipe creation. For this purpose, we believe having rich context such as code snippets, game analysis, and code guides might be highly important for producing high-quality strategies.

The other use case is part of the UX; we want to guide our users through the process and onboard them easily, even if it’s their first interaction with our game. Providing context is critical for the agent to answer user questions and to ask for clarifications if needed, in order to ensure that the final strategy is aligned with the user's intent.

We have created a game wiki that includes rich context about the game, and we are connecting it to the model using Lang Chain’s Retrieval QA

with a simple UI.

Demo

Diving Deeper Into GPT Performance and Reasoning

The heavily anticipated OpenAI keynote event did not disappoint. We are excited to test some of the new features and experiment with the OpenAI API. In the meantime, we have already experimented a lot with GPT 3.5 and GPT 4. Here are some examples of output we got:

Setup: We loaded up some context which included the Python version of 0xTitans and basic list of the game rules. after initial attempt to create a simple car we iterate with more complexity

First we asked the model to produce a sophisticated car strategy

To craft a more sophisticated car strategy, let's consider various possible game scenarios and how to optimize decisions based on the current state. We'll build on the previous strategy, incorporating advanced logic to make the car act more intelligently.

Here are the strategic considerations:

  1. Adaptive Acceleration: Rather than accelerating 3 times by default, assess the situation. If there's a car close behind, we may want to accelerate to maintain our lead. Conversely, if there's a car right ahead, we might want to accelerate to overtake it.

  2. Judicious Use of Shells: Don't use a shell impulsively. Only use it if it can ensure an overtaking move, or prevent someone from overtaking us.

  3. Prioritize Defense When Leading: If we're leading, the primary concern should be defense against attackers.

  4. Saving and Investing: If we're comfortably in the lead, save up for more expensive power-ups to solidify our position or for future needs.

  5. Adaptive Shielding: If a shell is imminent or if a car behind us has a history of using shells frequently, prioritize shielding.

Here's the revised strategy:

class SophisticatedCar:
    def takeYourTurn(self, game, cars, bananas, idx):
        ourCar = cars[idx]
        close_behind = idx > 0 and (ourCar.speed - cars[idx-1].speed) < 2
        close_ahead = idx < len(cars) - 1 and (cars[idx+1].speed - ourCar.speed) < 2

        # Adaptive Acceleration
        if ourCar.balance > game.getAccelerateCost(1):
            if close_behind or close_ahead:
                ourCar.balance -= game.buyAcceleration(1)

        # Judicious Use of Shells
        if idx != 0 and close_ahead and ourCar.balance > game.getShellCost(1):
            game.buyShell(1)
        elif idx + 1 == len(cars) and ourCar.balance > game.getSuperShellCost(1):
            game.buySuperShell(1)

        # Defense When Leading
        if idx == 0:
            if ourCar.shield == 0 and ourCar.balance > game.getShieldCost(2):
                game.buyShield(2)
            elif not close_behind and ourCar.balance > game.getAccelerateCost(2):
                ourCar.balance -= game.buyAcceleration(2)

        # Adaptive Shielding
        potential_threats = [car for car in cars if car.speed > ourCar.speed and car.balance >= game.getShellCost(1)]
        if potential_threats and ourCar.shield == 0 and ourCar.balance > game.getShieldCost(1):
            game.buyShield(1)

This strategy aims to make more judicious decisions based on the current game state, rather than using a one-size-fits-all approach. The car will analyze its immediate surroundings and the potential actions of its competitors to make the best move. This will likely lead to better performance in various game scenarios.

Advanced game theory strategy

After a pretty successful attempt, we took it a bit further, going creative with our prompts and asking the model to create a strategy based on game theory principles:

Output

Crafting the best strategy using game theory principles requires understanding the nature of the game, making assumptions about opponents, and considering different outcomes. To make our strategy unexploitable, we must assume that our opponents are also using optimal or near-optimal strategies. With that said, let's consider a game theory-inspired strategy:

  1. Nash Equilibrium: This principle asserts that in a game, players will adjust their strategies until no player can benefit from changing their own strategy while the other players keep theirs unchanged. For the car game, we can apply this by diversifying our actions, so that opponents cannot exploit a predictable behavior.

  2. Minimax Strategy: Aim to minimize our maximum possible loss. In terms of the car game, this means avoiding situations where a single action by an opponent could cause a significant drop in our position or speed.

  3. Value of Information: Prioritize actions that give more information about opponents' strategies. If we can predict their next moves, we can better counteract them.

Based on these principles, here's a game theory-inspired strategy:

class GameTheoryCar:
    def takeYourTurn(self, game, cars, bananas, idx):
        ourCar = cars[idx]
        competitors = [car for i, car in enumerate(cars) if i != idx]

        # Minimax Acceleration Strategy
        gaps = [abs(car.speed - ourCar.speed) for car in competitors]
        min_gap = min(gaps)
        
        # If the closest competitor (in terms of speed) is very close, prioritize acceleration.
        if min_gap <= 2 and ourCar.balance > game.getAccelerateCost(1):
            ourCar.balance -= game.buyAcceleration(1)

        # Judicious Use of Shells based on Nash Equilibrium
        potential_targets = [car for car in competitors if car.speed > ourCar.speed]
        
        # If there's a clear advantage in using a shell, do so.
        if potential_targets and ourCar.balance > game.getShellCost(1):
            closest_target = min(potential_targets, key=lambda car: abs(car.speed - ourCar.speed))
            if closest_target.speed - ourCar.speed <= 3:
                game.buyShell(1)

        # Defensive strategy using Nash Equilibrium
        if idx == 0:
            if ourCar.shield == 0 and ourCar.balance > game.getShieldCost(2):
                game.buyShield(2)
        
        # Value of Information Strategy: Monitor and learn from opponents' previous moves
        # Assuming we have a history of past moves (not shown in the code provided), analyze patterns in competitors' decisions.
        # If, for example, a certain competitor frequently uses shells when they are directly behind us, prioritize defense when they're close.
        
        # To implement this, you'd need a system to track the past moves of each competitor. Based on this history, 
        # make decisions to counteract their most likely next move.

        # Diversify actions for unpredictability.
        random_factor = random.random()
        if random_factor < 0.1 and ourCar.balance > game.getAccelerateCost(1):
            ourCar.balance -= game.buyAcceleration(1)
        elif random_factor < 0.2 and ourCar.balance > game.getShellCost(1):
            game.buyShell(1)

This strategy takes into consideration a mix of proactive and reactive decisions based on game theory principles. By considering the Nash Equilibrium and Minimax strategy, we aim to become unpredictable to our opponents while also countering their potential actions. The addition of randomness also ensures a degree of unpredictability, making our moves harder to anticipate.

However, it's important to note that the effectiveness of any strategy would need to be tested and refined through simulations or real gameplay.

We believe the GPT models demonstrate highly advanced "pure" reasoning skills. Given a highly limited context (code + simple list of rules), the models can generate coherent strategies with varying degrees of complexity and sophistication, while also responding meaningfully to user intent.

We are currently exploring optimal ways to leverage and integrate these powerful models, while remaining conscious of scope and utilization given OpenAI API constraints:

  • With the upcoming release of GPT-4 Turbo (a cheaper, faster version), we may be able to integrate GPT-4 more substantially into our system. For example, as an agent who creates a highly detailed "recipe" for the code agent.

  • GPT-4 could be even stronger with additional context from a RAG approach, using datasets with Solidity tools and other resources.

  • GPT-4's strong reasoning skills could be integrated into the evaluation process (testing accuracy of weaker models) or as a higher-level coordinator agent triggered after iterations for final refinements.

The Missing Piece: Different Approaches to Accuracy Problems and Lack of Mathematical Intuition in LLM Models

One of the most significant challenges associated with LLMs is their lack of accuracy in certain predictable tasks, particularly evident in mathematical problems. In the context of 0xTitans, the underlying mechanism is VRGDA Here, the price of every action (such as buying speed, shells, shields, and more) is determined by an exponential function and certain parameters. Essentially, the higher the demand for a certain action (high buy rate per turn), the higher the price; conversely, the lower the demand, the lower the price. Some changes in demand can lead to very strong price movements (a sharp increase or a drop to near zero).

Therefore, one crucial aspect of the game is budget management and price prediction.

Currently, we are considering different approaches to endow agents with mathematical reasoning:

  • Using mathematical or financial models such as FinGPT or MathWizard.

  • Utilizing external tools, such as math-calculation that executes during generation.

  • Employing code-based predictions.

For the last approach, here is one example of what we have done.

Instead of working directly with the mathematical notation for VRGDA:

VRGDA

We will use the actual code implementation (Python-based) and provide the model with some context to predict and then evaluate.

def computeActionPrice(targetPrice, perTurnPriceDecrease, turnsSinceStart, sold, sellPerTurnWad):
    perTurn = (sold + 1) / sellPerTurnWad
    turnStuff = (turnsSinceStart - 1) - perTurn
    decrease = math.log(1 - perTurnPriceDecrease)
    multiplier = math.exp(decrease * turnStuff)
    return int(targetPrice * multiplier)

After a bit of prompt engineering, we were able to achieve this impressive result:

I understand that you want a specific example. Let's do the calculation to identify the specific turn when the price will be "near zero".

Given:

  1. targetPrice = 1000

  2. We'll start with a perTurnPriceDecrease of 0.1 (10% decrease).

We'll now tweak sold and sellPerTurnWad and calculate:

import math

def computeActionPrice(targetPrice, perTurnPriceDecrease, turnsSinceStart, sold, sellPerTurnWad):
    perTurn = (sold + 1) / sellPerTurnWad
    turnStuff = (turnsSinceStart - 1) - perTurn
    decrease = math.log(1 - perTurnPriceDecrease)
    multiplier = math.exp(decrease * turnStuff)
    return int(targetPrice * multiplier)

targetPrice = 1000
perTurnPriceDecrease = 0.1

# Find the specific turn
for turnsSinceStart in range(1, 1000):
    price = computeActionPrice(targetPrice, perTurnPriceDecrease, turnsSinceStart, sold=10, sellPerTurnWad=2)
    if price < 1:
        break

turnsSinceStart, price

If we run the above code, it would give us a tuple where the first element is the turn count turnsSinceStart and the second element is the price which is near zero.

The output would be something like:

(74, 0)

Which means, after 74 turns, with a perTurnPriceDecrease of 10%, and given sold = 10 and sellPerTurnWad = 2, the price will become 0 (i.e., near zero). Note: The actual numbers might vary slightly, but this provides a direct example scenario for your request.

Tech Stack And Evaluation Updates

We are currently conducting most of our model testing using RunPod, which we find to be quite cost-effective and offers strong UI plugins.

We are primarily utilizing models from TheBloke, specifically AWQ models, which have proven to be both efficient and reliable.

Our focus remains on evaluating performance in Solidity code generation. So far, one of the best-performing and most dependable models has been Phind-CodeLlama-34B-v2.

We are also exploring the fine-tuning of models with Solidity datasets and wish to begin with phi-1, a highly compact yet efficient code model.

Regarding our agent framework, we are currently working with AutoGen, which can be found at autogen. We have recently spent some time on refactoring and making changes to better suit our use case.

**For the recent article check **

If you are working on LLM applications, want to learn more, or have an interesting use-case for Web3 please reach us out via
MatchboxDAO
Nate Product lead and Research
Nick Tech lead



For the recent article check:

https://mirror.xyz/matchboxdao.eth/u1ivlqsagVAfnDr7SAIDZF4WXXl37J_KrB4HQyZTQQg