Last Week in AI #287 - OpenAI's Strawberry is here!, Runway's Video-to-Video, Adobe Firefly Video
OpenAI releases o1, its first model with ‘reasoning’ abilities, Runway launches new video-to-video AI tool, Adobe’s Firefly AI Hits 12 Billion Generations, Previews Video Creator
Top News
OpenAI releases o1, its first model with ‘reasoning’ abilities
The long rumored OpenAI Strawberry is here, and it is called o1. OpenAI has introduced this new model as part of a planned series of "reasoning" models aimed at tackling complex problems more efficiently than ever before. Alongside the standard version, they’ve also launched o1-mini, a smaller, more cost-effective variant. OpenAI describes this release as a “preview,” highlighting its early-stage nature, and positioning o1 as a significant advancement in reasoning capabilities, such as solving multistep coding and math problems better than its predecessors, although at a higher cost and slower speed than GPT-4o.
What sets o1 apart is its training approach—unlike previous GPT models, which were trained to mimic data patterns, o1 uses reinforcement learning to think through problems, step by step, using a “chain of thought.” This training approach, along with more “inference-time compute” that essentially allows the model to perform multiple rounds of reasoning prior to responding, allows o1 to outperform GPT-4o in reasoning-heavy tasks such as programming competitions, AP math tests, and even PhD-level science problems. OpenAI has not released a detailed explanation of their approach, but appears to be closely tied to several prior research papers such as Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, Let's Verify Step by Step, Large Language Models Can Self-Improve, and Reinforced Self-Training (ReST) for Language Modeling.
For developers, however, it’s worth noting that the model takes much longer to produce outputs and the API costs for o1 are significantly higher than GPT-4o, making it a pricey tool, though it promises more accuracy in complex tasks. Furthermore, the models are still being fine-tuned, still produce hallucinations, and can’t process images or use function calls. Nevertheless, these models represent a major leap forward in AI’s problem-solving potential, paving the way for new advancements in fields like medicine, engineering, and advanced coding tasks. Beyond that, it marks a shift from only investing in scaling pretraining to also scaling inference-time compute as a means for improving the capabilities of AI models.
OpenAI Unveils New ChatGPT That Can Reason Through Math and Science
OpenAI Releases Its Most Powerful Model o1: 92.8% in PhD Physics, IOI Gold-Level Performance
I used o1-mini for coding every day since launch. Here's how it compares to Claude Sonnet 3.5.
First impressions of OpenAI o1: An AI designed to overthink it
Sponsored Message
The Professional Network For AI Agents
Agent.ai is the global marketplace and network for AI builders and fans. Hire AI agents to run routine tasks, discover new insights, and drive better results. Don't just keep up with the competition—outsmart them.
And leave the boring stuff to the robots 🤖
Sam Altman told OpenAI staff the company’s non-profit corporate structure will change next year
OpenAI's CEO, Sam Altman, has announced that the company's non-profit corporate structure will undergo changes in the coming year. The current structure, which Altman admits is "unusual," involves a non-profit arm controlling a for-profit arm, which in turn controls a holding company that oversees another for-profit entity. This last entity is where outside investors, such as Microsoft, have invested billions of dollars. While Altman did not provide specific details about the new structure, he indicated that the company will move away from being controlled by a non-profit, making OpenAI a more traditional for-profit company. This change, which has been rumored for months, is expected to provide a more certain potential return for investors and align OpenAI with the structure of most other large tech businesses.
Runway launches new video-to-video AI tool — here's what it can do
AI video platform RunwayML has introduced a new video-to-video tool in its latest model, Gen-3 Alpha. This new feature allows users to customize real-world videos using artificial intelligence, a significant upgrade from the previous Gen-2 model which lacked this capability. The video-to-video tool, available on the web interface for paid plan users, allows for precise control over movement, expressiveness, and intent within video generations. Users can upload their input video and steer the generation with a text prompt for any aesthetic direction. This advancement in generative AI video technology enables users to apply new aesthetics or specific effects to real footage, enhancing the platform's utility.
Adobe’s Firefly AI Hits 12 Billion Generations, Previews Video Creator
Adobe's Firefly Services, the company's AI-driven innovation, has reached a milestone of 12 billion generations, demonstrating Adobe's commitment to enhancing its Creative Cloud and Document Cloud platforms. The company's CEO, Shantanu Narayen, emphasized the customer-centric approach to AI, with Firefly models trained on data that ensures commercial safety. Adobe has released Firefly models for imaging, vector and design, and previewed a new Firefly Video model. The company's third-quarter revenue rose 11% to $5.41 billion, with digital experience subscription revenue growing 12% to $1.23 billion. Adobe's AI solutions, including Adobe GenStudio and Firefly Services, are designed to address personalized content creation at scale, and have been integrated into Adobe Photoshop, Illustrator, Lightroom, and Premiere Pro to enhance user creativity and productivity.
Other News
Tools
DataGemma: Google’s open AI models mitigate hallucination on statistical queries - Google introduces DataGemma, a pair of open-source AI models that address the issue of inaccurate answers in statistical queries by leveraging real-world data and two distinct approaches to enhance factual accuracy.
Hume AI Introduces Empathic Voice Interface 2 (EVI 2): New Foundational Voice-to-Voice Model Transforming Human-Like... - Hume AI has announced the release of Empathic Voice Interface 2 (EVI 2), a major upgrade to its groundbreaking voice-language foundation model, offering enhanced capabilities for developers looking to create more human-like interactions in voice-driven applications.
Salesforce unveils Agentforce to help create autonomous AI bots - Salesforce introduces Agentforce, a suite of low-code tools for building autonomous AI agents capable of taking actions on their own.
Hume Uniels EVI 2, its New Voice-to-Voice Foundation Model - Hume introduces EVI 2, a new voice-language foundation model that offers enhanced naturalness, emotional responsiveness, and rich customization options, representing a significant advancement in AI-driven conversational technology.
Fish Audio Introduces Fish Speech 1.4: A Powerful, Open-Source Text-to-Speech Model with... - Fish Audio has launched Fish Speech 1.4, an advanced open-source text-to-speech model with expanded language support, faster performance, and a commitment to accessibility.
Runway announces an API for its video-generating models - Runway announces an API for its video-generating models, facing competition in the video generation space and legal questions around training data.
Business
Waymo and Uber expand partnership to bring autonomous ride-hailing to Austin and Atlanta - Waymo and Uber are expanding their partnership to bring fully autonomous ride-hailing to Austin and Atlanta through the Uber app, with Waymo managing the fleet and Uber providing fleet management services.
AI startup Poolside nears $3 billion valuation before ever releasing product - AI startup Poolside is set to raise nearly $500 million in new financing, aiming for a $3 billion valuation before releasing a product, showing continued investor appetite for AI startups.
Cloud-Computing Firm CoreWeave In Talks for Share Sale at $23 Billion Valuation - Cloud-computing firm CoreWeave is in talks for a share sale at a $23 billion valuation, allowing existing shareholders to tender between $400 million and $500 million worth of their holdings.
Face to face with Figure’s new humanoid robot - A robotics company in Silicon Valley has made significant progress in developing humanoid robots for real-world work scenarios, with a focus on automotive assembly and plans to target the home market in the future.
‘We have the next few years in the bag’ Sam Altman touts U.S. AI supremacy, ChatGPT release and St. Louis - Sam Altman touts the release of ChatGPT and U.S. AI supremacy, emphasizing the potential for AI to revolutionize computer programming and scientific progress.
Copilot Pages is Microsoft’s new collaborative AI playground for businesses - Microsoft introduces Copilot Pages, a collaborative AI feature that allows users to work with the Copilot chatbot to create and edit pages together in real time, aiming to revolutionize the way humans and AI collaborate in the workplace.
AI coding assistant Supermaven raises cash from OpenAI and Perplexity co-founders - AI coding assistant Supermaven, co-founded by Jacob Jackson, has raised $12 million in funding and is gaining traction with over 35,000 developers using the platform, while addressing ethical and legal challenges in the AI coding tools market.
Nvidia faces billion-dollar patent challenge over its new AI Blackwell chips - Nvidia faces a billion-dollar patent challenge over its new AI Blackwell chips, as Xockets claims that Nvidia’s AI acceleration technology was "stolen" by Mellanox, and is seeking billions in damages and threatening to block the chip's rollout.
OpenAI Messed With the Wrong Mega-Popular Parenting Forum - AI company OpenAI initially expressed interest in partnering with the popular parenting forum Mumsnet to access its 6 billion word dataset, but later backed out, citing the dataset as too small and publicly available, sparking legal action from Mumsnet.
Research
Let loose in Minecraft, 1,000 autonomous AI agents collaborate to build their own society - 1,000 autonomous AI agents collaborate to build their own society in a Minecraft server, forming a merchant hub, establishing a constitution, and encountering challenges that shed light on the development of more human-like AI.
Building open world games with AI – GameGen O by Tencent - GameGen O, developed by Tencent, is an AI model designed to generate open-world video games, aiming to transform game development by reducing time and costs through automated creation of characters, environments, actions, and events.
Self-Harmonized Chain of Thought - Large language models can perform complex reasoning via intermediate steps, and a new method called ECHO consolidates diverse solution paths into a uniform and effective solution pattern, demonstrating the best overall performance across three reasoning domains.
SongCreator: Lyrics-based Universal Song Generation - SongCreator is a song-generation system that uses a dual-sequence language model to generate songs with vocals and accompaniment from given lyrics, achieving state-of-the-art performance and surpassing previous works in lyrics-to-song and lyrics-to-vocals tasks.
LLaMA-Omni: Seamless Speech Interaction with Large Language Models - LLaMA-Omni is a novel model architecture designed for low-latency and high-quality speech interaction with large language models, eliminating the need for speech transcription and providing better responses in both content and style with extremely low latency.
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning - Automating scientific discovery through multi-agent intelligent graph reasoning using large-scale ontological knowledge graphs and multi-agent systems with in-situ learning capabilities to reveal hidden interdisciplinary relationships and generate research hypotheses.
ChatGPT's Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools - ChatGPT's potential in detecting misuse in cryptography through a comparative analysis with static analysis tools is explored, highlighting its significance for individuals and organizations.
EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis - EyeCLIP is a visual-language foundation model designed for multi-modal ophthalmic image analysis, aiming to provide a comprehensive solution for analyzing ophthalmic images using AI.
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources - Using Source2Synth, a new method is proposed to teach Large Language Models new skills without human annotations by generating synthetic data points with intermediate reasoning steps grounded in real-world sources, improving dataset quality and performance in challenging domains.
AI chatbot gets conspiracy theorists to question their convictions - AI chatbot debunks conspiracy theories and shifts participants' thinking, showing potential to combat harmful misinformation.
Human Perception of LLM-generated Text Content in Social Media Environments - AI-generated text content in social media environments is being studied, but the specific paper mentioned in the article is not found.
Concerns
Stalker Allegedly Created AI Chatbot on NSFW Platform to Dox and Harass Woman - Man arrested for stalking, doxing, and harassing a woman for seven years, allegedly using AI to create fake nudes and a chatbot in her likeness to distribute personal information on a NSFW platform.
Facebook admits to scraping every Australian adult user's public photos and posts to train AI, with no opt out option - Facebook has admitted that it scrapes the public photos, posts and other data of Australian adult users to train its AI models and provides no opt out option, even though it allows people in the European Union to refuse consent.
Analysis
OpenAI o1: A New Paradigm For AI - OpenAI o1 marks the birth of a new paradigm in AI, emphasizing reasoning and scaling, with a focus on exploring, scaling, and maturing the new paradigm, and the potential implications for the future of AI.
The fable of Reflection 70B - A small team develops a groundbreaking AI model, Reflection 70B, using a simple fine-tuning technique, but as the model gains attention, doubts arise about its authenticity, leading to a cautionary tale about the need for transparency and skepticism in the AI community.