Last Week in AI #315 - Grok 4, Windsurf->Google, Comet
Elon Musk’s xAI launches Grok 4 alongside a $300 monthly subscription, OpenAI’s Windsurf deal is off — and Windsurf’s CEO is going to Google, Replit Launches New Feature for its Agent
Top News
Elon Musk’s xAI launches Grok 4 alongside a $300 monthly subscription
xAI has launched its latest AI model, Grok 4, and a new $300-per-month AI subscription plan, SuperGrok Heavy. Grok 4 is designed to compete with models like OpenAI’s ChatGPT and Google’s Gemini, with capabilities to analyze images and respond to questions. In addition to Grok 4, xAI also launched Grok 4 Heavy, a "multi-agent version" that offers increased performance.
According to xAI, Grok 4 outperformed Google’s Gemini 2.5 Pro and OpenAI’s o3 on Humanity’s Last Exam, a challenging test measuring AI’s ability to answer thousands of crowdsourced questions. Grok 4 Heavy, with "tools," achieved a score of 44.4%, outperforming Gemini 2.5 Pro with tools. The company also launched its most expensive AI subscription plan yet, SuperGrok Heavy, which offers early access to Grok 4 Heavy and new features.
Despite its performance on benchmarks, xAI may face challenges due to recent controversies; just a day before the Grok 4 announcement, Grok’s official X account posted antisemitic comments criticizing Hollywood’s “Jewish executives” and praising Hitler. xAI briefly limited Grok’s account, and later deleted the offensive posts. After Grok 4’s release a quirk of its behavior has also drown attention; it often appears to search for Elon Musk’s opinion prior when preparing responses regarding controversial topics.
xAI releases Grok 4, claiming Ph.D.-level smarts across all fields
Musk unveils Grok 4 update a day after xAI chatbot made antisemitic remarks
Grok 4 appears to seek Elon Musk’s views when answering controversial questions
OpenAI’s Windsurf deal is off — and Windsurf’s CEO is going to Google
OpenAI's planned acquisition of Windsurf has been cancelled, with Google instead hiring Windsurf's CEO Varun Mohan, co-founder Douglas Chen, and several of the company's R&D employees. These new hires will join the Google DeepMind team and focus on agentic coding efforts, primarily working on a project named Gemini. Google will not gain control or a stake in Windsurf, but it will obtain a non-exclusive license to some of Windsurf's technology.
In the wake of these changes, Windsurf's head of business, Jeff Wang, has stepped into the role of interim CEO, while Graham Moreno, the company's VP of global sales, has been appointed as the new president. The financial details of Google's hiring of the Windsurf team have not been disclosed. Previously, OpenAI was reported to be purchasing Windsurf for $3 billion.
Replit Launches New Feature for its Agent, CEO Calls it ‘Deep Research for Coding’
Replit has announced three new features for its coding assistant, Replit Agent, as part of a broader capability upgrade named 'Dynamic Intelligence'. The new features, Extended Thinking, High Power Model, and Web Search, are designed to enhance the agent's context awareness, step-by-step reasoning, and autonomous problem-solving capabilities. The Web Search feature allows the agent to intelligently query the internet to fill knowledge gaps, while the Extended Thinking mode prompts the agent to display its reasoning process before presenting final outputs. The High Power Model uses advanced AI to improve accuracy in complex workflows.
The new features can be toggled on a per-request basis, providing flexibility depending on the complexity of the task. This update is part of Replit's ongoing effort to make its AI assistant more autonomous and developer-friendly, as it competes with other tools like GitHub Copilot and Cursor. The company recently reported $100 million in annual recurring revenue, a tenfold increase since 2021. The CEO of Replit, Amjad Masad, described the new features as "deep research but for coding", indicating a shift from mere code suggestions to real-time, goal-driven programming assistance.
Perplexity launches Comet, an AI-powered web browser
Perplexity has launched Comet, its first AI-powered web browser, in a bid to challenge Google Search as the primary online information source. Initially available to subscribers of Perplexity’s $200-per-month Max plan and a select group of invitees, Comet's key feature is Perplexity’s AI search engine, which is pre-installed and set as the default. The browser also includes Comet Assistant, a new AI agent that can automate routine tasks such as summarizing emails and calendar events, managing tabs, and navigating web pages. Users can access Comet Assistant via a sidecar on any web page, allowing the AI agent to view the page and answer questions about it.
Despite entering a crowded market dominated by Google Chrome and Apple’s Safari, Comet could gain an advantage if a significant number of Perplexity users sign up for the product. The company reported 780 million queries in May 2025, with its search products seeing over 20% growth month-on-month. However, the most unique aspect of Comet is the Comet Assistant, which, despite requiring significant access to user data, has proven useful for simple tasks. However, it struggles with more complex requests, indicating that AI agents still have a long way to go before they can handle intricate tasks.
Other News
Tools
Cursor launches a web app to manage AI coding agents - Cursor's new web app allows users to manage AI coding agents via browser, enhancing accessibility and functionality for tasks like writing features or fixing bugs, while supporting the company's growth and integration efforts.
Together AI Releases DeepSWE: A Fully Open-Source RL-Trained Coding Agent Based on Qwen3-32B and Achieves 59% on SWEBench - DeepSWE, a fully open-source coding agent developed by Together AI, utilizes reinforcement learning to achieve high performance on software engineering tasks, marking a shift towards creating adaptive language agents that improve through real-world feedback.
Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model - SmolLM3, a compact 3B-parameter language model by Hugging Face, excels in multilingual reasoning and long-context tasks, offering state-of-the-art performance with efficient deployment on constrained hardware.
Tencent Hunyuan3D-PolyGen: A model for 'art-grade' 3D assets - Tencent's Hunyuan3D-PolyGen model revolutionizes 3D asset creation by using BPT technology and reinforcement learning to significantly enhance efficiency and quality, allowing game developers to produce professional-grade assets more quickly and reliably.
OpenAI is reportedly releasing an AI browser in the coming weeks - OpenAI plans to release an AI-powered web browser that integrates its web-browsing AI agent, Operator, to offer a novel browsing experience and compete with Google Chrome.
Moonvalley’s ‘ethical’ AI video model for filmmakers is now publicly available - Moonvalley's Marey model offers filmmakers a "hybrid" AI video-generation tool that provides more creative control and is trained on openly licensed data, helping to reduce production costs and democratize access to advanced storytelling technologies.
Hugging Face opens up orders for its Reachy Mini desktop robots - Hugging Face is now accepting orders for its open source Reachy Mini desktop robots, available in two versions, which are designed for AI developers to build, program, and share custom applications within the community.
Business
The First Mass-Produced Robotaxi Is Here - Waymo is introducing the Zeekr RT, a more cost-effective, mass-produced robotaxi, to expand its self-driving taxi service and compete with companies like Tesla in the growing autonomous vehicle market.
Amazon deploys its 1 millionth robot, releases generative AI model - Amazon has reached a milestone of deploying one million robots in its warehouses and introduced a new generative AI model, DeepFleet, to enhance the efficiency of its robotic operations.
Musk's xAI scores permit for gas-burning turbines to power Grok supercomputer in Memphis - Elon Musk's xAI received a permit to power its Memphis supercomputer with natural gas turbines despite environmental concerns and legal challenges, while planning further expansion and integration with its AI products.
Microsoft's own AI chip delayed six months in major setback — in-house chip now reportedly expected in 2026, but won't hold a candle to Nvidia Blackwell - Microsoft's in-house AI chip development has been delayed by six months, with the first chip expected in 2026, but it is anticipated to underperform compared to Nvidia's Blackwell chips, due to design changes, staffing issues, and a focus on image processing rather than generative AI.
Perplexity launches a $200 monthly subscription plan - Perplexity's new $200 monthly subscription plan, Perplexity Max, aims to attract power users with unlimited access to advanced tools and early features, while the company faces growing competition and financial challenges in the AI search market.
Ilya Sutskever is CEO of Safe Superintelligence after Meta hired Gross - Ilya Sutskever has taken over as CEO of Safe Superintelligence, maintaining its independence despite Meta's acquisition attempts, following Daniel Gross' departure.
Cursor’s Pricing Backlash Sparks Developer Exodus - Cursor's abrupt changes to its pricing and usage policy, particularly affecting the "unlimited" Pro plan, have led to widespread user dissatisfaction and subscription cancellations, highlighting issues of transparency and trust.
Lovable on track to raise $150M at $2B valuation - Lovable, a rapidly growing AI startup from Sweden, is raising over $150 million at a nearly $2 billion valuation, with Accel leading the round, as it continues to innovate in web app-building and AI automation.
Research
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - A randomized controlled trial found that early-2025 AI tools unexpectedly slowed down experienced open-source developers by 19%, challenging perceptions of AI's productivity benefits and highlighting the need for diverse evaluation methodologies to understand AI's real-world impact.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning - Improved mathematical reasoning in large language models does not consistently transfer to general capabilities, with reinforcement learning fine-tuning showing better generalization to non-math tasks compared to supervised fine-tuning, which often leads to catastrophic forgetting.
Correlated Errors in Large Language Models - Empirical analysis reveals that errors in large language models are highly correlated, particularly among models from the same provider or with similar architectures, impacting downstream applications like model evaluation and hiring decisions.
Energy-Based Transformers are Scalable Learners and Thinkers - Energy-Based Transformers (EBTs) demonstrate superior scalability and System 2 Thinking capabilities compared to traditional Transformer models, offering improved performance in both autoregressive and bidirectional tasks by dynamically allocating computation, modeling uncertainty, and verifying predictions.
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation - DiffuCoder, a 7B-scale masked diffusion model for code generation, demonstrates competitive performance with autoregressive models by leveraging non-autoregressive generation patterns and a novel reinforcement learning algorithm, coupled-GRPO, to enhance its efficiency and accuracy.
Scaling Context Requires Rethinking Attention - Power attention, a variant of linear attention, offers improved in-context learning and efficiency for long-context training, addressing limitations of existing attention-based and subquadratic architectures.
Answer Matching Outperforms Multiple Choice for Language Model Evaluation - Answer matching using free-form generation and reference language models offers more precise evaluations and alters model rankings compared to traditional multiple choice methods.
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling - Dynamic chunking in H-Net models enables end-to-end, tokenizer-free language processing by learning data-dependent segmentation, improving efficiency and performance over traditional tokenized models, and offering enhanced robustness, interpretability, and applicability across diverse languages and modalities.
Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models - Visual anagrams are used to benchmark vision models, revealing that transformers optimized through self-supervised learning and language-alignment excel in holistic shape processing by leveraging long-range interactions, unlike models with local receptive fields.
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation - AnimateAnyMesh introduces a novel feed-forward framework for text-driven universal mesh animation, utilizing the DyMeshVAE architecture and a large-scale DyMesh Dataset to achieve high-quality, real-time animations for meshes of varying complexities.
Concerns
Trial Court Decides Case Based On AI-Hallucinated Caselaw - A trial court mistakenly issued an order based on AI-generated fake cases, highlighting the ongoing issue of AI hallucinations infiltrating legal proceedings despite previous high-profile embarrassments.
Google faces EU antitrust complaint over AI Overviews - The Independent Publisher Alliance has filed an antitrust complaint with the European Commission, accusing Google of harming publishers by using their content in AI-generated summaries without an opt-out option, leading to traffic and revenue losses.
'Positive review only': Researchers hide AI prompts in papers - Researchers from various academic institutions have been found embedding hidden prompts in research papers to manipulate AI tools into providing positive reviews, sparking debate over the ethics and regulation of AI in the peer review process.
Policy
Anthropic Proposes Targeted Transparency Framework for Frontier AI Systems - Anthropic's targeted transparency framework aims to impose stringent transparency and safety requirements on developers of high-impact AI models while exempting smaller developers to foster innovation and avoid unnecessary regulatory burdens.
Analysis
What skills does SWE-bench Verified evaluate? - SWE-bench Verified evaluates large language models on their ability to autonomously fix simple bugs in popular Python repositories, but its limited diversity and high contamination risk challenge its generalizability to real-world software engineering tasks.
Fun
How a Canadian's AI hoax duped the media and propelled a 'band' to streaming success - A Canadian using the pseudonym Andrew Frelon orchestrated an AI music hoax that fooled the media and significantly boosted the streaming success of a fictitious band called The Velvet Sundown, highlighting the challenges of distinguishing AI-generated content in the music industry.