Last Week in AI #298 - Gemini 2.0, Amazon's Nova, Sora, Llama 3.3
Google Reveals Gemini 2, AI Agents, and a Prototype Personal Assistant, Amazon announces its own set of Nova AI models, and more!
Top News
Google Reveals Gemini 2, AI Agents, and a Prototype Personal Assistant
Google has announced Gemini 2, an upgraded version of its flagship AI model, designed to perform tasks on users' computers and the web, and interact like a human. The model has improved multimodal abilities, enhancing its skills in interpreting video and audio, and engaging in speech. Google's CEO, Sundar Pichai, emphasized the company's focus on developing more "agentic" models that can understand the world, plan ahead, and act on behalf of users. Alongside Gemini 2, Google introduced two specialized AI agents for coding and data science, and showcased Project Mariner, a Chrome extension capable of web navigation to perform tasks for users.
Google launched Gemini 2.0, its new AI model for practically everything
Google is testing Gemini AI agents that help you in video games
Sponsored Message
Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World
We recommend the AI safety book “Uncontrollable"! This is not a doomer book, but instead lays out the reasonable case for AI safety and what we can do about it.
Max TEGMARK said that “Uncontrollable” is a captivating, balanced, and remarkably up-to-date book on the most important issue of our time".
It would make a great holiday gift! Find “Uncontrollable” by Darren McKee on Amazon or Audible today!
Amazon announces its own set of Nova AI models
Amazon has unveiled a new series of AI foundation models, known as "Nova", which will be part of the Amazon Bedrock model library in AWS. The Nova series includes three "understanding" models: Nova Micro, a text model optimized for speed and cost; Nova Lite, a low-cost multimodal model that processes images, video, and text; and Nova Pro, a highly capable multimodal model. Amazon is also developing Nova Premier, a multimodal model for complex reasoning tasks, set to be available in early 2025. Additionally, Amazon is releasing content generation models, Nova Canvas for image generation, and Nova Reel for video generation, both with watermarking capabilities to promote responsible AI use. The company also plans to release a speech-to-speech model and a native multimodal-to-multimodal model later in 2025. These announcements were made at the AWS re:Invent conference, where Amazon also revealed its partnership with Anthropic to build a massive AI compute cluster using its Trainium 2 chips.
OpenAI has finally released Sora
OpenAI's AI text-to-video generator, Sora, is now available to the public. Initially revealed in February and tested by a select group of visual artists, designers, and filmmakers, Sora can convert text prompts into videos, transform photos into videos, It is accessible via Sora.com for ChatGPT subscribers in the US and many other countries, offering different subscription tiers for higher-quality, longer videos and more priority outputs. Non-subscribers can still view community-generated videos on Sora’s feed. Sora includes a “storyboards” feature for chaining prompts into sequences, a “remix” tool for refining output, and a “blend” function to merge scenes. All generated videos will include watermarks and metadata to indicate they are AI-made, and users must agree to strict content guidelines before uploading.
Meta unveils a new, more efficient Llama model
Meta has introduced a new, more efficient generative AI model, Llama 3.3 70B, which outperforms its largest model, Llama 3.1 405B, at a lower cost. The new model has shown superior performance on industry benchmarks, including MMLU, which assesses a model's language understanding capabilities, compared to Google's Gemini 1.5 Pro, OpenAI's GPT-4o, and Amazon's Nova Pro. Despite restrictions on usage by platforms with over 700 million monthly users, Llama models have been downloaded over 650 million times and power Meta's AI assistant, which has nearly 600 million monthly active users. However, Meta has faced challenges with the open nature of Llama, including allegations of misuse by Chinese military researchers and difficulties in complying with EU regulations on AI and data privacy. To support future Llama models, Meta is investing in a $10 billion AI data center in Louisiana and has procured over 100,000 Nvidia GPUs for model development.
Other News
Tools
World Labs’ AI can generate interactive 3D scenes from a single photo - World Labs' AI system uniquely generates interactive and modifiable 3D scenes from a single image, offering potential applications for industries like gaming and filmmaking, despite current limitations in exploration and rendering.
Tencent Launches HunyuanVideo, an Open-Source AI Video Model - HunyuanVideo, Tencent's new open-source AI model, offers advanced text-to-video generation with innovative video-to-audio synthesis and efficient scaling techniques, outperforming several commercial models in visual and motion quality.
Google DeepMind Just Released PaliGemma 2: A New Family of Open-Weight Vision Language Models (3B, 10B and 28B) - Google DeepMind's PaliGemma 2 series offers a versatile range of open-weight vision-language models with varying scales and resolutions, excelling in diverse tasks and providing flexible solutions for both academic and industry applications.
The new Surf browser shows why everyone’s trying to connect AI to the web - Surf, a new browser by Deta, integrates AI to enhance web interactions by allowing users to query video content, organize information into contexts, and manipulate web pages, all while maintaining a focus on local processing for security.
Microsoft’s Copilot can browse the web with you using AI ‘Vision’ - Microsoft's Copilot Vision feature, currently in limited testing, allows users to interact with AI while browsing by enabling it to read and assist with webpage content, with a strong emphasis on user privacy and security.
OpenAI expands ChatGPT Canvas to all users - OpenAI's expansion of ChatGPT Canvas to all users includes new features like Python code execution, automatic integration with GPT-4o, and enhanced capabilities for custom GPTs, aiming to improve user experience and functionality.
Reddit’s New AI Search Tool Helps You Find Reddit Answers Without Google - Reddit Answers is an AI-powered search tool that provides well-formatted responses and direct links to Reddit sources, aiming to offer a more streamlined and timely alternative to traditional search engines like Google.
Replit’s New Assistant, AI Agent Now Available for All - Replit's new AI Assistant and Agent, now available to all users, enable code modifications and software development through natural language prompts, with a new pricing model based on checkpoints.
Cognition Labs’ AI Software Engineer Devin Launched for Subscribers - Cognition Labs has launched Devin, an AI software engineer capable of performing complex coding tasks, available to individuals and engineering teams on a subscription basis, with additional enterprise options.
Yelp’s new AI-powered review filters will show more of what you want to know - Yelp's new AI-powered "Review Insights" feature provides users with aggregated sentiment scores and quick summaries of customer opinions on specific aspects of businesses, enhancing the review browsing experience.
Musk’s xAI has launched Grok image generation model - xAI's new image generation model, Grok, enhanced with the Aurora network, offers high-quality photorealistic rendering and multimodal input capabilities, while the company plans to expand its supercomputing resources significantly.
Business
Waymo Officially Enters the L.A. Marketplace - Waymo has expanded its autonomous ride-hailing service to Los Angeles, marking its largest city operation yet, with a focus on safety and collaboration with local authorities, despite limitations on highway use.
Waymo’s next robotaxi city will be Miami - Waymo plans to launch its robotaxi service in Miami by 2026, following years of testing and preparation, with fleet management support from Moove, as part of its expanding operations in various U.S. cities.
GM halts funding of robotaxi development by Cruise - General Motors is shifting its focus from developing robotaxis with Cruise to enhancing advanced driver assistance systems and autonomous technology for personal vehicles, while integrating Cruise's operations with GM's technical teams.
Elon Musk files for injunction to halt OpenAI’s transition to a for-profit - Elon Musk is seeking a preliminary injunction to stop OpenAI's transition to a for-profit entity, alleging anticompetitive behavior and self-dealing involving OpenAI, its co-founders, and Microsoft.
OpenAI is partnering with defense tech company Anduril - OpenAI's partnership with Anduril marks its first collaboration with a defense contractor, reflecting a shift in its stance on military applications of its AI technology.
OpenAI gets new $1.5 billion investment from SoftBank, allowing employees to sell shares in a tender offer - SoftBank's $1.5 billion investment in OpenAI allows employees to sell shares through a tender offer, highlighting SoftBank's strategic interest in expanding its AI portfolio.
Elon Musk lands priority for Nvidia GB200 delivery in January with US$1.08 billion - Elon Musk's xAI secured priority delivery of Nvidia's GB200 hardware by offering a premium to expedite a US$1.08 billion order.
ChatGPT now has over 300 million weekly users - ChatGPT's rapid growth to over 300 million weekly users is attributed to OpenAI's continuous enhancements, including an AI search engine and a new interface for code adjustments.
OpenAI confirms new $200 monthly subscription, ChatGPT Pro, which includes its o1 reasoning model - OpenAI's new $200 monthly ChatGPT Pro subscription offers enhanced access to its advanced o1 reasoning model, targeting power users with improved performance in tasks like coding and math, while also including additional features like GPT-4o and Advanced Voice Mode.
Amazon forms a new AI agent-focused lab led by Adept co-founder - Amazon is establishing the Amazon AGI SF Lab in San Francisco, led by Adept co-founder David Luan, to develop AI agents capable of performing complex tasks in both digital and physical environments, with a focus on real-world actions and learning from human feedback.
Key leaders behind Google’s viral NotebookLM are leaving to create their own startup - Three key members of Google's NotebookLM team are leaving to start a stealth startup focused on creating consumer-facing AI products that leverage the latest AI models.
A16z in Talks to Lead $200 Million Round in Black Forest Labs, Startup Behind AI Images on Grok - Andreessen Horowitz is negotiating to lead a $200 million investment round in Black Forest Labs, a German AI startup that collaborates with Elon Musk's Grok for image production, shortly after its launch.
OpenAI-backed Speak raises $78M at $1B valuation to help users learn languages by talking out loud - Speak, an AI-driven language learning platform, has raised $78 million in a Series C funding round to expand its offerings beyond English by focusing on conversational skills rather than traditional reading and writing methods.
OpenAI Startup Fund raises $44M in its largest SPV yet - OpenAI Startup Fund's fifth and largest SPV, raising over $44 million, aims to support existing portfolio companies and make new investments, despite maintaining a low profile about specific fund allocations.
Research
DeepMind’s Genie 2 can generate interactive worlds that look like video games - DeepMind's Genie 2 model can generate interactive 3D worlds from images and text, simulating complex environments and actions, but raises questions about intellectual property and its potential impact on the video game industry.
DeepMind AI weather forecaster beats world-class system - GenCast, an AI model developed by Google DeepMind, surpasses traditional physics-based weather forecasting systems by using historical data to generate more accurate and faster predictions, particularly for extreme weather events.
PRIME Intellect Releases INTELLECT-1 (Instruct + Base): The First 10B Parameter Language Model Collaboratively Trained Across the Globe - INTELLECT-1, a 10-billion-parameter language model, showcases the potential of decentralized, community-driven training by achieving competitive performance with innovations in distributed frameworks, thus broadening access to advanced AI technologies.
Training Large Language Models to Reason in a Continuous Latent Space - Introducing Coconut, a new paradigm that enables large language models to reason in a continuous latent space, allowing for more effective problem-solving through breadth-first search and outperforming traditional chain-of-thought methods in complex reasoning tasks.
JetFormer: An Autoregressive Generative Model of Raw Images and Text - JetFormer is an autoregressive decoder-only transformer that unifies image and text generation without relying on separately pretrained components, achieving competitive text-to-image generation quality and robust image understanding.
An Evolved Universal Transformer Memory - Neural Attention Memory Models (NAMMs) enhance transformer efficiency and performance by learning memory management, allowing for improved context handling and zero-shot transfer across different architectures and modalities.
Structured 3D Latents for Scalable and Versatile 3D Generation - A novel 3D generation method using a unified Structured LATent representation enables versatile, high-quality 3D asset creation with flexible output formats and local editing capabilities, outperforming existing models.
Boundless Socratic Learning with Language Games - Socratic learning through language games enables agents to achieve significant self-improvement by leveraging recursive feedback and broad data coverage, with limitations primarily due to time and potential misalignment.
Concerns
AI Safety Researcher Quits OpenAI, Saying Its Trajectory Alarms Her - Rosie Campbell resigned from OpenAI due to concerns about the company's shifting focus away from safety and the dissolution of the AGI Readiness team, following the departure of key personnel and internal changes.
OpenAI’s New Ad Shows ‘Reasoning’ AI Making Basic Errors - OpenAI's latest AI model, o1, despite its advanced reasoning capabilities, demonstrated significant logical errors in a promotional video, highlighting ongoing challenges in AI accuracy.
ChatGPT’s search results for news are ‘unpredictable’ and frequently inaccurate - Columbia's Tow Center for Digital Journalism found that ChatGPT's search tool often provides inaccurate and confidently incorrect responses, struggling to correctly attribute quotes and sources.
Why does the name ‘David Mayer’ crash ChatGPT? Digital privacy requests may be at fault - ChatGPT's refusal to process certain names, including "David Mayer," is likely due to internal privacy tools flagging these names for special handling, possibly related to digital privacy requests or legal concerns.
Policy
What Trump’s New AI and Crypto Czar David Sacks Means For the Tech Industry - David Sacks' appointment as Trump's AI and crypto czar raises concerns over potential conflicts of interest and lack of oversight, while signaling a pro-industry approach to AI and cryptocurrency regulation.
Expert Opinions
🔮 From ChatGPT to a billion agents - Azeem predicts that by 2025, AI systems will advance to a level that surprises even skeptics, with agents being a central focus of discussion.