Our 216th episode with a summary and discussion of last week's big AI news!
Recorded on 07/11/2025
Hosted by Andrey Kurenkov and Jeremie Harris.
Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai
In this episode:
xAI launches Grok 4 with breakthrough performance across benchmarks, becoming the first true frontier model outside established labs, alongside a $300/month subscription tier
Grok's alignment challenges emerge with antisemitic responses, highlighting the difficulty of steering models toward "truth-seeking" without harmful biases
Perplexity and OpenAI launch AI-powered browsers to compete with Google Chrome, signaling a major shift in how users interact with AI systems
Meta study reveals AI tools actually slow down experienced developers by 20% on complex tasks, contradicting expectations and anecdotal reports of productivity gains
Timestamps + Links:
(00:00:10) Intro / Banter
(00:01:02) News Preview
Tools & Apps
(00:01:59) Elon Musk's xAI launches Grok 4 alongside a $300 monthly subscription | TechCrunch
(00:15:28) Elon Musk’s AI chatbot is suddenly posting antisemitic tropes
(00:29:52) Perplexity launches Comet, an AI-powered web browser | TechCrunch
(00:32:54) OpenAI is reportedly releasing an AI browser in the coming weeks | TechCrunch
(00:33:27) Replit Launches New Feature for its Agent, CEO Calls it ‘Deep Research for Coding’
(00:34:40) Cursor launches a web app to manage AI coding agents
(00:36:07) Cursor apologizes for unclear pricing changes that upset users | TechCrunch
Applications & Business
(00:49:54) Ilya Sutskever becomes CEO of Safe Superintelligence after Meta poached Daniel Gross
(00:52:46) OpenAI’s Stock Compensation Reflect Steep Costs of Talent Wars
Projects & Open Source
(00:58:04) Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model - MarkTechPost
(00:58:33) Kimi K2: Open Agentic Intelligence
Research & Advancements
(01:02:14) Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
(01:07:58) Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
(01:13:03) Mitigating Goal Misgeneralization with Minimax Regret
(01:17:01) Correlated Errors in Large Language Models
Policy & Safety
(01:22:53) Evaluating Frontier Models for Stealth and Situational Awareness
(01:25:49) When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors
(01:30:09) Why Do Some Language Models Fake Alignment While Others Don't?
(01:34:35) Positive review only': Researchers hide AI prompts in papers
(01:35:40) Google faces EU antitrust complaint over AI Overviews
(01:37:30) Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Share this post