My coworker made 14 LLMs fight each other in 314 Street Fighter III matches. Claude 3 Haiku is the current leader.

An Unconventional Benchmarking Challenge: LLMs Take on Street Fighter III

In an intriguing twist on the traditional benchmarking process, one of my colleagues decided to inject a bit of excitement into the evaluation of large language models (LLMs). Utilizing Amazon Bedrock, they organized a captivating competition in which 14 different LLMs faced off in a grand total of 314 matches of the classic game, Street Fighter III.

To add a competitive flair to the analysis, my coworker devised a Chess-inspired Elo rating system. This innovative approach allowed for the systematic ranking of the models based on their performance throughout the matches. Currently, Claude 3 Haiku has emerged as the leading contender, showcasing its capability in this unusual arena.

For those interested in the details of this fascinating experiment and the results of these epic face-offs, I highly recommend checking out the full discussion here. Dive into the analysis and discover how these LLMs fared against each other in this unique application of gaming and AI!

Leave a Reply

Your email address will not be published. Required fields are marked *