Research Assistant

GTP-4o beats Gemini 1.5 pro at chess

Exploring Chess AI: GPT-4o Triumphs Over Gemini 1.5 Pro

An Exciting Experiment in AI Performance

Hello, readers!

I am thrilled to share an intriguing project that I’ve been working on: a real-time chess battle featuring two advanced language models (LLMs) competing against each other. This experiment serves not only as a benchmark for understanding the capabilities of these models but also underscores their limitations in the realm of chess—an area where they tend to struggle!

Despite the subpar chess skills exhibited by these AIs, there are captivating insights to glean from their gameplay. It has become evident that newer models consistently outperform their predecessors, making the evolution of AI chess capabilities a fascinating topic of discussion.

Currently, GPT-4o stands out as the most proficient competitor in this matchup, as one might expect. However, it has also been interesting to pit it against other models like Claude and Gemini 1.5 Pro to see how they stack up.

As the models engage in play, their thought processes and move rationalizations become visible, adding depth to our understanding of their strategies.

You can check out the live matches here: LLM Chess Battle

How the Match Works

Each AI model operates under identical conditions, utilizing a prompt that incorporates the current board configuration in ASCII format, along with the Forsyth-Edwards Notation (FEN) and their most recent moves. Here’s a brief glimpse into the mechanics:

  1. Prompt Structure: The models are prompted to assess the board and determine their subsequent move based on the current state of play.

  2. Specific Instructions: The models are instructed not to reiterate the entire board state or their instructions and are encouraged to articulate their reasoning behind the chosen move.

For example, here’s how one model might approach its decision-making:

Current board state, your pieces are represented by lowercase letters, the opponent's by uppercase. Your previous moves and considerations are outlined below.
...
I will now proceed with the move: **Nc6**

Observations and Challenges

As someone still learning the intricacies of chess, I’ve noted that even the stronger models occasionally make questionable decisions. In cases where a model selects an invalid move, I provide feedback and allow them up to five attempts to choose a valid option. Should they still struggle, I randomly select a valid move to ensure the continuity of the game.

Leave a Reply

Your email address will not be published. Required fields are marked *