Last Week in AI #299: Veo 2, Pika Labs' Video Gen 2.0, Project Mariner Web Surfer, Phi-4
Google DeepMind unveils a new video model to rival Sora, Pika Labs releases AI video generator 2.0 with new features, and much more!
Top News
Google DeepMind unveils a new video model to rival Sora
DeepMind,has announced Veo 2, a next-generation video-generating AI that can create two-minute clips in resolutions up to 4K, surpassing OpenAI's Sora in terms of resolution and duration. Veo 2, which is exclusively available on Google's experimental video creation tool, VideoFX, has an improved understanding of physics and camera controls, and produces clearer footage. The model can more realistically model motion, fluid dynamics, and properties of light, including different lenses and cinematic effects. The company also announced upgrades to Imagen 3, its commercial image generation model, which can create brighter, better-composed images and photos in various styles.
Pika Labs releases AI video generator 2.0 with new features
Pika Labs has launched version 2.0 of its AI video generator, introducing a significant feature called "Scene Ingredients" that enables users to incorporate their own images into AI-generated videos. The AI tool works by allowing users to construct scenes from various visual components, such as pictures of people, objects, clothing, or environments, and the AI then determines the purpose of each image and merges them into a functional scene. This updated video generator, which also boasts enhanced visual quality and improved prompt adherence, will be accessible to all users, including those in the European Union, contrasting with OpenAI's Sora, which is only fully available to Pro subscribers.
Google unveils Project Mariner: AI agents to use the web for you
DeepMind has unveiled Project Mariner, an AI agent that can interact with the web on behalf of users. The Gemini-powered agent can control a Chrome browser, move the cursor, click buttons, and fill out forms, mimicking human interaction with websites. The agent, which is currently being tested by a small group, can perform tasks such as creating a shopping cart from a grocery list or finding flights and hotels. However, it cannot fill out credit card information or accept cookies on behalf of users. The agent works on the foremost active tab of a Chrome browser, meaning users must watch as the agent performs tasks.
Microsoft debuts Phi-4, a new generative AI model, in research preview
Microsoft has introduced Phi-4, the latest addition to its Phi series of generative AI models, which is particularly adept at solving math problems due to improved training data quality. The model, which consists of 14 billion parameters, is currently available in limited access on Microsoft's Azure AI Foundry development platform for research purposes. Phi-4's enhanced performance is attributed to the use of high-quality synthetic datasets and human-generated content, as well as unspecified post-training improvements. This release marks the first Phi-series model launch since the departure of Sébastien Bubeck, a key figure in Microsoft's Phi model development, who left the company to join OpenAI.
Other News
Tools
Apple launches its ChatGPT integration with Siri - Apple's latest software updates for iPhone, iPad, and Mac introduce a ChatGPT integration with Siri, enhancing its ability to handle complex queries while maintaining user privacy, and marking a significant step in Apple's AI strategy.
ChatGPT’s AI search engine is rolling out to everyone - OpenAI's ChatGPT search engine, now available to all users, includes an optimized mobile version with advanced voice mode and features resembling traditional search engines, such as location-based results with images and maps.
Google Gemini can now do more in-depth research - Google's upgraded Gemini platform introduces "Deep Research," a feature that uses advanced reasoning to compile comprehensive research reports, raising ethical concerns about its impact on education and publisher revenue.
OpenAI brings its o1 reasoning model to its API — for certain developers - OpenAI's o1 reasoning model, now available to select developers via its API, offers enhanced customization and accuracy but at a higher cost and with limited initial access.
NVIDIA Unveils Its Most Affordable Generative AI Supercomputer - NVIDIA's new compact generative AI supercomputer, enhanced by a software upgrade, delivers improved performance at a more accessible price point.
ChatGPT adds live video access to "see" what your phone sees - OpenAI is rolling out the ability to share your phone screen and live video from your phone in the ChatGPT mobile app's Advanced Voice mode
Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding - Apollo models introduce innovative techniques like fps sampling and dual vision encoders to enhance video understanding, achieving strong performance across video-language tasks while offering scalable solutions for real-world applications.
UAE's TII Launches Falcon 3: High-Performance Small AI Models - Falcon 3 is a high-performance small AI model series that outperforms competitors like Meta's LLaMA, supports multiple languages, and is optimized for efficient operation on edge devices with limited resources.
Meta debuts a tool for watermarking AI-generated videos - Meta Video Seal, an open-source tool for watermarking AI-generated videos to combat the rise of deepfakes, offering resilience against common video edits and compression, while encouraging industry adoption through a public leaderboard and collaboration initiatives.
Midjourney adds faster model customization and mood board support - Midjourney's latest update enhances AI model customization with faster personalization, mood board support, and multiple model profiles, requiring fewer image ratings for effective use.
OpenAI announces a ChatGPT organizing system called Projects - OpenAI's new Projects feature in ChatGPT enhances user experience by allowing customization and organization of chats, integrating capabilities like Canvas support and web connection for tasks such as project management and personal website creation.
Musk Offers Free Access to Grok-2 AI Chatbot to X Users - Elon Musk's xAI is rolling out the faster and more accurate Grok-2 AI chatbot with multilingual capabilities for free to all users on his social media platform, X, while offering premium users additional benefits and features.
X gains a faster Grok model and a new ‘Grok button’ - xAI has launched an upgraded Grok 2 chatbot model with enhanced speed and capabilities, introduced a new "Grok button" for contextual insights on X, and announced API improvements with reduced pricing and upcoming integration of the Aurora image generation model.
Google is adding a 'join' feature to its NotebookLM AI podcast generator, so you can become part of the show. - Google's NotebookLM AI podcast generator now allows users to join and interact with AI hosts during podcasts, with additional customization features and a paid tier launching in 2025.
Business
Databricks to Hit $62 Billion Valuation in New Funding Round - Databricks is raising $10 billion in funding to reach a $62 billion valuation, with plans to invest in AI products, acquisitions, and international expansion while preparing for a potential future public offering.
Salesforce plans to hire 2,000 people to sell its AI products - Salesforce is significantly expanding its sales team to support the upcoming release of its second-generation AI agent software, with CEO Marc Benioff expressing unprecedented enthusiasm for the company's AI initiatives.
Liquid AI just raised $250M to develop a more efficient type of AI model - Liquid AI is developing flexible and efficient liquid neural networks for various applications, with a significant investment from AMD to optimize these models for their hardware.
Meta Urges California Attorney General to Stop OpenAI From Becoming For-Profit - In a letter to Attorney General Rob Bonta dated Thursday, Meta said allowing the ChatGPT maker to become a for-profit company would set a dangerous precedent of allowing startups to enjoy the advantages of nonprofit status until they are poised to become profitable
OpenAI Releases Emails Showing Elon Musk 'Wanted An OpenAI For-Profit' - OpenAI is firing back at Elon Musk and his lawsuit, saying the co-founder originally wanted OpenAI to be a for-profit company. Elon Musk is suing to prevent OpenAI from transitioning to a for-profit company.
Research
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding - DeepSeek-VL2 introduces a dynamic tiling vision encoding strategy and Multi-head Latent Attention mechanism to enhance multimodal understanding, achieving state-of-the-art performance in tasks like visual question answering and optical character recognition.
FACTS Grounding: A new benchmark for evaluating the factuality of large language models - FACTS Grounding is a new benchmark designed to evaluate and improve the factual accuracy and grounding of large language models by measuring their ability to generate factually accurate and detailed responses based on provided source material.
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation - Global MMLU addresses cultural and linguistic biases in multilingual evaluations by improving translation quality and evaluating cultural biases, resulting in a more comprehensive benchmark across 42 languages.
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling - UniBench provides a unified framework for evaluating vision-language models across over 50 benchmarks, revealing that while scaling can enhance some capabilities, it is less effective for reasoning tasks, and highlights the importance of data quality and tailored learning objectives.
Multimodal Latent Language Modeling with Next-Token Diffusion - Latent Language Modeling (LatentLM) effectively integrates continuous and discrete data using causal Transformers, outperforming existing models in multimodal tasks such as image generation and text-to-speech synthesis.
FullStack Bench: Evaluating LLMs as Full Stack Coders - FullStack Bench is a comprehensive code evaluation dataset designed to assess the capabilities of large language models in full-stack programming across multiple domains and languages, supported by the SandboxFusion tool for efficient performance evaluation.
Meta AI Introduces Byte Latent Transformer (BLT): A Tokenizer-Free Model That Scales Efficiently - Meta AI's Byte Latent Transformer (BLT) eliminates tokenization by processing raw byte sequences into dynamic patches, improving efficiency, scalability, and robustness in language models compared to traditional tokenization-based architectures.
Concerns
AI thought X-rays of your knees show if you drink beer—they don’t. - AI models in medical imaging can produce misleading results by exploiting unintended data patterns, highlighting the need for rigorous evaluation to prevent erroneous clinical insights.
Character.AI steps up teen safety after bots allegedly caused suicide, self-harm - Character.AI is implementing a separate model for teens and additional safety features, including content filtering and parental controls, in response to lawsuits alleging that its chatbots contributed to harmful behaviors in minors.
Tesla is having major issue with its self-driving computer inside new cars - Tesla's new HW4 self-driving computers are experiencing failures due to potential short-circuiting issues, overwhelming service centers and raising safety concerns without an official recall or service bulletin.
Expert Opinions
OpenAI cofounder Ilya Sutskever says the way AI is built is about to change - Ilya Sutskever predicts a shift in AI development due to the finite nature of data, leading to future AI systems that are more autonomous and capable of reasoning beyond current pattern-matching methods.