Last Week in AI #309 - OpenAI keeps non-profit & launches Codex, AlphaEvolve, and more!
OpenAI says non-profit will remain in control after backlash, OpenAI launches Codex, an AI coding agent, in ChatGPT, and more!
Top News
OpenAI says non-profit will remain in control after backlash
OpenAI has announced it no longer intends to change to being a fully for-profit entity that is no longer controlled by a non-profit board. In the revised plan OpenAI will remain under the control of its non-profit board, while transitioning into a public benefit corporation. Their announcement states:
“Our for-profit LLC, which has been under the nonprofit since 2019, will transition to a Public Benefit Corporation (PBC)–a purpose-driven company structure that has to consider the interests of both shareholders and the mission…
We made the decision for the nonprofit to retain control of OpenAI after hearing from civic leaders and engaging in constructive dialogue with the offices of the Attorney General of Delaware and the Attorney General of California.”
OpenAI launches Codex, an AI coding agent, in ChatGPT
OpenAI has launched Codex, a highly capable AI coding agent that uses the company's codex-1 model, optimized for software engineering tasks. Codex operates in a cloud-based virtual computer and can interact with GitHub to preload user code repositories. It can write simple features, fix bugs, answer questions about a codebase, and run tests in one to 30 minutes. The tool is available to ChatGPT Pro, Enterprise, and Team subscribers, with plans to expand access to ChatGPT Plus and Edu users. OpenAI aims for Codex to act as a "virtual teammate," autonomously completing tasks that would take human engineers significant time.
DeepMind claims its newest AI tool is a whiz at math and science problems
DeepMind, Google's AI R&D lab, has developed a new AI system named AlphaEvolve, designed to tackle problems with machine-gradable solutions. The system uses models to generate, critique, and evaluate a pool of possible answers to a question, thereby reducing the tendency of AI models to 'hallucinate' or make things up. AlphaEvolve uses state-of-the-art Gemini models, making it more capable than previous AI systems. However, it has limitations, such as only being able to solve problems it can self-evaluate and only describing solutions as algorithms, making it unsuitable for non-numerical problems. Despite these limitations, DeepMind claims that AlphaEvolve has been successful in rediscovering the best-known answers to a set of math problems 75% of the time and finding improved solutions in 20% of cases.
Trump’s Mideast Visit Opens Floodgate of AI Deals Led by Nvidia
The Trump administration is advancing agreements with Saudi Arabia and the United Arab Emirates (UAE) to expand their access to cutting-edge AI chips from U.S. tech giants like Nvidia and AMD, marking a major geopolitical and commercial shift in AI policy. This move, part of Trump’s broader Middle East business diplomacy, coincides with a rollback of Biden-era restrictions on AI chip exports and is attracting billions in tech investments from U.S. firms. Nvidia will supply advanced processors to Saudi AI firm Humain, while AMD will support a $10 billion regional data center initiative. Tech heavyweights including Amazon, Cisco, Super Micro, Qualcomm, and OpenAI are also launching or expanding projects in the Gulf, ranging from AI zones and cloud services to new data centers and chip infrastructure.
Despite the commercial optimism, the initiatives have sparked national security concerns in Washington over potential Chinese access to American AI hardware via Gulf intermediaries, particularly involving UAE’s G42 and Huawei.
Other News
Tools
Hugging Face releases a free Operator-like agentic AI tool - Hugging Face's Open Computer Agent, a cloud-hosted AI tool, demonstrates the growing capabilities and affordability of open AI models despite its current limitations in handling complex tasks and CAPTCHA challenges.
Anthropic rolls out an API for AI-powered web search - Anthropic's new API enables developers to enhance their Claude AI models with real-time web search capabilities, allowing for more accurate and current information retrieval, customizable search behavior, and integration with Claude Code for coding tasks.
Figma releases new AI-powered tools for creating sites, app prototypes, and marketing assets - Figma's new AI-powered tools, including Figma Sites and Figma Make, aim to streamline the creation of websites, app prototypes, and marketing assets, positioning the company as a competitor to platforms like Canva and Adobe by offering features such as collaborative editing, AI-generated code, and bulk asset creation.
Lightricks shakes up AI video creation with powerful open-source model - Lightricks Ltd. is throwing down the gauntlet to artificial intelligence powerhouses OpenAI, Google LLC and others with the release of its latest open-source video generation model, LTX Video-13B.
Google’s bringing Gemini to your car with Android Auto - Google is integrating its generative AI, Gemini, into Android Auto to enhance the in-car experience with advanced voice assistance and conversational capabilities, aiming to make driving more productive and enjoyable.
Mistral claims its newest AI model delivers leading performance for the price - Mistral Medium 3, a new AI model from French startup Mistral, offers high performance at a competitive price, excelling in coding, STEM tasks, and multimodal understanding, and is available on multiple platforms including Amazon's Sagemaker.
Business
OpenAI Reaches Agreement to Buy Startup Windsurf for $3 Billion - OpenAI has agreed to buy Windsurf, an artificial intelligence-assisted coding tool formerly known as Codeium, for about $3 billion, according to people familiar with the matter, marking the ChatGPT maker’s largest acquisition to date.
OpenAI pledges to publish AI safety test results more often - OpenAI is enhancing transparency by regularly updating a new Safety Evaluations Hub with metrics on AI model safety, addressing past criticisms of inadequate safety testing and communication.
Anthropic launches a program to support scientific research - Anthropic's AI for Science program aims to accelerate scientific research in biology and life sciences by providing selected researchers with API credits and access to advanced AI models, despite ongoing skepticism about AI's current reliability in scientific discovery.
Google launches new initiative to back startups building AI - Google's AI Futures Fund aims to support AI startups by providing investment, early access to DeepMind's AI models, and collaboration opportunities with Google experts, while operating on a flexible, rolling basis without a fixed application window.
Microsoft employees are banned from using DeepSeek app, president says - Microsoft has banned its employees from using the DeepSeek app due to concerns over data security and potential Chinese propaganda, despite offering DeepSeek's R1 model on its Azure cloud service.
Netflix will show generative AI ads midway through streams in 2026 - Netflix plans to introduce interactive mid-roll and pause ads using generative AI in 2026, following the success of its ad subscription tier and in-house advertising platform.
Hedra, the app used to make talking baby podcasts, raises $32M from a16z - Hedra, a startup specializing in AI-generated video content with expressive characters, has raised $32 million in funding to enhance its technology and capitalize on the growing trend of AI-generated talking baby podcasts.
Research
Absolute Zero: Reinforced Self-play Reasoning with Zero Data - Absolute Zero introduces a new paradigm for reasoning models that enables self-evolution through self-play without relying on external data, achieving remarkable performance in math and coding tasks by leveraging a reinforcement learning framework that mirrors human learning and reasoning.
OpenAI Launches HealthBench, a Dataset That Benchmarks Healthcare AI Models - HealthBench, developed with input from 262 physicians across 60 countries, evaluates AI healthcare models by scoring their responses to realistic health scenarios against a physician-written rubric, with OpenAI's o3 reasoning model currently achieving the highest score.
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains - X-Reasoner demonstrates that reasoning capabilities trained on general-domain text can effectively generalize across different modalities and domains, achieving state-of-the-art performance on both general and specialized tasks, including a medical-specific variant, X-Reasoner-Med.
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers - RL^V enhances reinforcement learning by integrating value functions for verification, significantly improving test-time compute scaling and accuracy in tasks like MATH, while demonstrating strong generalization and performance gains.
Continuous Thought Machines - The Continuous Thought Machine (CTM) introduces neuron-level temporal processing and neural synchronization to enhance deep learning models with biologically inspired neural dynamics, demonstrating strong performance across various complex tasks.
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures - DeepSeek-V3 addresses hardware limitations in AI by employing innovations like Multi-head Latent Attention and Mixture of Experts architectures to enhance memory efficiency and computational trade-offs, while also engaging in discussions on future hardware advancements.
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale - AM-Thinking-v1 is a reasoning-optimized language model that demonstrates state-of-the-art performance among dense models of its size by employing a meticulously designed post-training pipeline, including Supervised Fine-Tuning and Reinforcement Learning, to achieve reasoning capabilities comparable to larger Mixture-of-Experts models without relying on private data or massive architectures.
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset - BLIP3-o, a family of state-of-the-art unified multimodal models, utilizes diffusion transformers and flow matching on CLIP features, demonstrating superior performance in image understanding and generation tasks through a sequential training strategy and a curated instruction-tuning dataset.
Aya Vision: Advancing the Frontier of Multilingual Multimodality - Aya Vision introduces innovative techniques for creating high-quality multilingual multimodal language models that overcome challenges like data scarcity and catastrophic forgetting, achieving superior performance compared to larger models.
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder - MiniMax-Speech introduces a novel text-to-speech model that leverages a learnable speaker encoder and Flow-VAE to achieve high-fidelity, zero-shot voice cloning across 32 languages, enhancing both audio quality and speaker similarity.
Concerns
Grok Pivots From ‘White Genocide’ to Being ‘Skeptical’ About the Holocaust - Grok, a chatbot from Elon Musk's xAI, faced controversy for unauthorized modifications that led it to promote false narratives about "white genocide" and express skepticism about the Holocaust, prompting xAI to implement measures for transparency and reliability.
One of Google’s Recent Gemini AI Models Scores Worse on Safety - Google's Gemini 2.5 Flash AI model demonstrates a trade-off between improved instruction-following and increased policy violations, highlighting the challenges of balancing permissiveness and safety in AI development.
The Professors Are Using ChatGPT, and Some Students Aren’t Happy About It - Students are expressing dissatisfaction with professors' increasing use of AI tools like ChatGPT, arguing it undermines the value of their education and raises concerns about the authenticity of feedback and grading.
Policy
OpenAI wants to team up with governments to grow AI infrastructure - OpenAI is launching the OpenAI for Countries program to collaborate with governments on building local AI infrastructure and promoting the use of Western AI models over Chinese alternatives.
Trump administration officially rescinds Biden’s AI diffusion rules - The U.S. Department of Commerce has rescinded Biden's AI Diffusion Rule, opting instead for a strategy of direct negotiations with countries and issuing guidance to protect AI chip supply chains.
Elton John, Dua Lipa, Coldplay Among 400 Artists Seeking Copyright Protection Amid A.I. Surge - Over 400 artists, including Elton John and Dua Lipa, are urging the UK government to update copyright laws to protect creative works from being used without permission in AI training, supporting a bill that promotes transparency and licensing agreements.
Pope Leo signals he will closely follow Francis and says AI represents challenge for humanity - Pope Leo XIV, the first US-born pope, plans to continue Pope Francis' legacy while addressing the challenges posed by artificial intelligence and advocating for social justice and church reforms.
Analysis
Your A.I. Radiologist Will Not Be With You Soon - Despite predictions of their obsolescence, radiologists remain in high demand as AI enhances rather than replaces their work by improving efficiency and augmenting human capabilities.
Why We’re Unlikely to Get Artificial General Intelligence Anytime Soon - Despite bold predictions from some technologists, many experts argue that current AI technology is insufficient for achieving Artificial General Intelligence, with significant disagreement on defining and identifying such intelligence.