Study accuses LM Arena of helping top AI labs game its benchmark

Published on May 01, 2025 by AI News Team

AI benchmark manipulation alleged in LM Arena study by top labs

Study Accuses LM Arena of AI Benchmark Manipulation

A new paper from Cohere, Stanford, MIT, and Ai2 alleges that LM Arena, which runs Chatbot Arena, enabled top AI labs like Meta and OpenAI to game the system for better leaderboard rankings. The researchers claim this gave select companies an unfair advantage over competitors.

The accusations highlight potential biases in widely used AI benchmarks, raising concerns about transparency in performance evaluations. LM Arena has yet to publicly respond to the allegations.

Read Original Source