Last Week in AI #308 - The Leaderboard Illusion, ChatGPT Glazing, Qwen 3, Ernie X1

OpenAI undoes its glaze-heavy ChatGPT update, Alibaba unveils Qwen 3, a family of ‘hybrid’ AI reasoning models , Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost

May 02, 2025

∙ Paid

Top News

The Leaderboard Illusion

The authors of this paper argue that the over-reliance on a single leaderboard can lead to overfitting and gaming of the system, rather than genuine technological advancement. They conducted a systematic review of the Chatbot Arena, analyzing data from 2 million battles, 42 providers, and 243 models over a fixed time peri…

Continue reading this post for free, courtesy of Last Week in AI.

Or purchase a paid subscription.