Last Week in AI #308 - The Leaderboard Illusion, ChatGPT Glazing, Qwen 3, Ernie X1
OpenAI undoes its glaze-heavy ChatGPT update, Alibaba unveils Qwen 3, a family of ‘hybrid’ AI reasoning models , Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost
Top News
The Leaderboard Illusion
The authors of this paper argue that the over-reliance on a single leaderboard can lead to overfitting and gaming of the system, rather than genuine technological advancement. They conducted a systematic review of the Chatbot Arena, analyzing data from 2 million battles, 42 providers, and 243 models over a fixed time peri…
Keep reading with a 7-day free trial
Subscribe to Last Week in AI to keep reading this post and get 7 days of free access to the full post archives.