Last Week in AI #296 - new Gemini model tops leaderboard, xAI gets funding, Pixtral Large
Google drops new Gemini model and it goes straight to the top of the LLM leaderboard, Elon Musk's xAI raising up to $6 billion to purchase 100,000 Nvidia chips for Memphis data center, and more!
Top News
Google drops new Gemini model and it goes straight to the top of the LLM leaderboard
Google's latest AI model, Gemini-Exp-1114, has topped the Imarena Chatbot Arena leaderboard, surpassing OpenAI's GPT-4o and o1-preview reasoning model. The leaderboard, previously known as the LMSys arena, allows AI labs to compete their models in a blind head-to-head competition, with users voting without knowledge of which model is which. The Gemini-Exp-1114 model, developed by Google DeepMind, excels particularly in math and vision tasks. Despite its success, the model is not yet available on the Gemini app or website, and can only be accessed through a free Google AI Studio account. It remains unclear whether this model is a version of Gemini 1.5 or an early insight into the capabilities of the anticipated Gemini 2.
Elon Musk's xAI raising up to $6 billion to purchase 100,000 Nvidia chips for Memphis data center
Elon Musk's artificial intelligence company, xAI, is reportedly raising up to $6 billion at a $50 billion valuation to acquire 100,000 Nvidia chips for a new supercomputer in Memphis. The funding, expected to close next week, is a combination of $5 billion from Middle Eastern sovereign funds and $1 billion from other investors. The company, which Musk launched in 2023, released a chatbot named Grok last November, aiming to compete with other AI companies like OpenAI, Google's Bard technology, and Anthropic's Claude chatbot. Amidst these developments, Musk is also actively working with the new administration of President-elect Donald Trump on its approach to AI and technology.
Mistral unleashes Pixtral Large and upgrades Le Chat into full-on ChatGPT competitor
French startup Mistral has launched Pixtral Large, a 124-billion-parameter model, and upgraded its chatbot, Le Chat, to compete directly with OpenAI's ChatGPT. Pixtral Large, an open-source multimodal AI, excels in text and visual data processing, and can handle up to 30 high-resolution images per input or a 300-page book. It demonstrates top performance across diverse benchmarks, making it ideal for tasks like chart interpretation, document analysis, and image understanding. Le Chat, now powered by Pixtral Large, has been enhanced with features such as web search with citations, a canvas for ideation, advanced document and image analysis, image generation, and task agents for automation. Despite these advancements, Mistral's models and API usage by large enterprises remain behind those of U.S.-based companies like OpenAI, Anthropic, and Microsoft.
Other News
Tools
Codeium launches Windsurf Editor, an Agentic Integrated Development Environment - Windsurf Editor by Codeium integrates AI collaboration and autonomous task-handling to create a seamless development experience, enhancing productivity through its innovative Cascade feature that combines deep codebase understanding and real-time developer interaction.
Microsoft introduces new adapted AI models for industry - Microsoft is collaborating with industry partners to introduce adapted AI models tailored to specific industry needs, available through the Azure AI model catalog, to enhance business outcomes and innovation across various sectors.
OpenAI Nears Launch of AI Agent Tool to Automate Tasks for Users - OpenAI's upcoming AI agent, codenamed "Operator," is designed to perform tasks like coding and travel booking autonomously for users.
Cerebras Delivers Record-Breaking Performance with Meta’s Llama 3.1 405B Model - Cerebras Systems has achieved a significant milestone in AI inference by setting a new performance record with Meta's Llama 3.1 405B model, delivering 969 tokens per second and enabling real-time responses, which is up to 75 times faster than traditional GPU-based solutions.
Google’s Gemini chatbot now has memory - Google's Gemini chatbot now includes a memory feature that personalizes interactions by remembering user preferences and information, available to Google One AI Premium subscribers on the web, but not used for model training.
DeepL unveils next frontier for Language AI with voice translation solution: DeepL Voice - DeepL Voice introduces real-time voice translation for meetings and conversations, expanding DeepL's language AI capabilities to spoken communication and enabling multilingual interactions with high accuracy and security.
Introducing the Forge Reasoning API Beta and Nous Chat: An Evolution in LLM Inference - Nous Research is launching the Forge Reasoning API Beta and Nous Chat, which enhance language model inference and reasoning capabilities through advanced architectures like Monte Carlo Tree Search, Chain of Code, and Mixture of Agents, allowing users to leverage multiple models for diverse and sophisticated AI interactions.
Ignite 2024 introduces new AI agents and more for Microsoft 365 Copilot - Microsoft Ignite 2024 unveils a range of new AI agents and enhancements for Microsoft 365 Copilot, aimed at improving workplace efficiency through automation, collaboration, and organizational tools.
ElevenLabs now offers ability to build conversational AI agents - ElevenLabs has introduced a platform for building customizable conversational AI agents, allowing users to integrate their own knowledge bases and select from various language models, while leveraging its existing text-to-speech capabilities.
Chinese AI startup takes aim at OpenAI's Sora with image-to-video tool launch - Shengshu Technology's Vidu tool now enables the creation of visually consistent videos by integrating multiple images, positioning itself as a competitor to OpenAI's Sora.
Perplexity introduces a shopping feature for Pro users in the U.S. - Perplexity's new shopping feature for Pro users in the U.S. integrates AI-powered search with e-commerce, offering unbiased product recommendations, one-click checkout, and a merchant program to enhance user experience and compete with major players like Google and Amazon.
Suno V4 Ai Music Generator Is Out Now And It’s Very Impressive - Suno V4 introduces significant advancements in AI music generation, including improved audio quality, dynamic song structures, and innovative features like the ReMi lyrics assistant, enhancing creative possibilities for users.
Anthropic’s new AI tools promise to simplify prompt writing and boost accuracy by 30% - Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Business
‘Figure 02 Is Now 4x Faster on BMW’s Production Line’ - Figure AI's humanoid robot, Figure 02, has achieved a 400% increase in speed and a sevenfold improvement in success rate on BMW's production line, with plans for further deployment and enhancements by 2025.
Fastino Emerges from Stealth With Task-Optimized LLMs — 1000x Faster Than Leading Models, No Need for GPUs - Fastino has launched task-optimized language models that operate up to 1000 times faster than traditional models using CPUs or NPUs, reducing the need for GPUs and enhancing accuracy, speed, and safety for enterprise AI applications.
Inside Microsoft's struggles with Copilot - Microsoft's AI product Copilot is facing significant challenges, including customer dissatisfaction, security concerns, and internal skepticism, while competitors capitalize on its struggles and the company grapples with justifying its substantial investment in AI.
Nvidia's Delayed Blackwell AI Chips Overheating in Servers - Nvidia's Blackwell GPUs, initially delayed due to overheating issues in server racks, may have had the problem resolved, but the challenge of managing energy and heat in AI data centers remains significant.
Musk’s amended lawsuit against OpenAI names Microsoft as defendant - Elon Musk's revived lawsuit against OpenAI now includes Microsoft and other defendants, alleging antitrust violations and monopolistic practices, while accusing OpenAI of abandoning its nonprofit mission and unfairly benefiting from Microsoft's resources.
H, the AI startup that raised $220M, launches its first product: Runner H for ‘agentic’ applications - H, the Paris-based AI startup, has launched its first product, Runner H, an "agentic" AI designed for tasks like robotic process automation and quality assurance, built on a proprietary compact LLM with 2 billion parameters, and is preparing to release APIs for developers while raising a Series A to further its development.
Nuro expands driverless testing after pivoting to licensing its AV tech - Nuro is expanding its driverless vehicle testing to new areas and more complex environments as part of a strategic shift to license its autonomous technology to automakers and mobility operators.
Sam Altman will co-chair San Francisco mayor-elect Daniel Lurie’s transition team - Sam Altman, CEO of OpenAI, will co-chair San Francisco mayor-elect Daniel Lurie's transition team to help the city innovate and strengthen ties with the tech industry, as Lurie aims to address public safety issues and retain tech entrepreneurs in the area.
Research
MIT Researchers Propose Boltz-1: The First Open-Source AI Model Achieving AlphaFold3-Level Accuracy in Biomolecular Structure Prediction - Boltz-1, developed by MIT researchers, is an open-source AI model that matches AlphaFold3-level accuracy in predicting biomolecular structures, offering innovations like new MSA pairing algorithms and a unified cropping approach to enhance accuracy and reduce computational demands, thereby democratizing access to advanced biomolecular modeling.
Releasing the largest multilingual open pretraining dataset - Pleias has released Common Corpus, the largest open multilingual dataset for training large language models, featuring over 2 trillion tokens of permissibly licensed content across diverse languages and domains, aiming to balance openness and performance while addressing data quality and compliance challenges.
A.I. Chatbots Defeated Doctors at Diagnosing Illness - ChatGPT-4 outperformed doctors in diagnosing medical conditions, highlighting both the chatbot's superior accuracy and the potential overconfidence of doctors in their own diagnoses.
What are the data scaling laws for imitation learning in robotics? - Understanding data scaling laws in imitation learning for robotics highlights the importance of dataset diversity and generalization across various environments to improve manipulation policies and achieve better performance.
Scaling Laws for Precision - Precision-aware scaling laws are introduced to predict the effects of low precision training and inference on language models, suggesting that training larger models in lower precision may be compute optimal.
LLaVA-o1: Let Vision Language Models Reason Step-by-Step - LLaVA-o1 is a novel Vision-Language Model that improves precision in reasoning-intensive tasks by autonomously conducting multistage reasoning and utilizing a new dataset and inference-time scaling method.
AnimateAnything: Consistent and Controllable Animation for Video Generation - AnimateAnything introduces a novel video generation method that uses a multi-scale control feature fusion network and a frequency-based stabilization module to achieve precise, consistent, and flicker-free animations across various conditions.
Top-$nσ$: Not All Logits Are You Need - Top-nsigma is a novel sampling method for large language models that improves reasoning task performance by efficiently filtering tokens using a statistical threshold on pre-softmax logits, outperforming existing methods and maintaining stability across temperature variations.
Generative World Explorer - Genex, an egocentric world exploration framework, enables agents to mentally explore large-scale 3D environments and update their beliefs with imagined observations, improving decision-making without constant physical exploration.
A statistical approach to model evaluations - The article discusses a new research paper that provides statistical recommendations for evaluating AI models, emphasizing the use of the Central Limit Theorem, clustering standard errors, reducing variance, analyzing paired differences, and conducting power analysis to ensure more accurate and reliable model comparisons.
Evaluating the role of `Constitutions' for learning from AI feedback - Detailed constitutions enhance emotive feedback quality in AI models, but may not improve practical skills like information gathering in medical interviews.
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use - Claude 3.5 Computer Use is explored as a pioneering AI model offering GUI-based computer use, demonstrating its potential and limitations in executing language to desktop actions across various domains.
Generative Agent Simulations of 1,000 People - The article discusses the importance of citing arXiv papers in repositories to ensure their visibility on platforms like Hugging Face.
Concerns
So, Yeah, AI Is Already Taking Our Jobs - Generative AI tools like ChatGPT are rapidly reducing job opportunities in automation-prone fields, but those who adapt by acquiring AI skills may find new opportunities in the evolving job market.
GEMA Sues OpenAI Over Song Lyrics In a First for PROs - GEMA has filed a lawsuit against OpenAI in Germany for using song lyrics without permission, aiming to clarify copyright law and enforce licensing for generative AI systems.
AI-Generated Elon Musk Inspiration Porn Is Viral on Facebook - AI-generated disinformation portraying Elon Musk as a benevolent problem-solver is being widely spread on Facebook and other platforms, driven by spammers seeking to profit from engagement and exploiting public interest around events like the U.S. presidential election.
Policy
OpenAI’s new policy blueprint for AI imagines a role for government - OpenAI's policy blueprint envisions a significant role for the U.S. government in AI development, emphasizing infrastructure, energy systems, and economic zones to boost productivity and counter China's influence.
The Code of Practice for general-purpose AI offers a unique opportunity for the EU - How should companies that build general-purpose AI (GPAI) models – the engines of applications like ChatGPT – address the potential risks they create? Today (14 November), the AI Office of the European Commission published the first draft of the Code of Practice for GPAI, which aims to answer th
Denmark launches landmark framework for using AI under EU rules — with Microsoft backing - Denmark's new framework, supported by Microsoft, provides guidelines for EU member states to responsibly implement AI in compliance with the EU's AI Act, focusing on secure, reliable services and collaboration between public and private sectors.
The US Patent and Trademark Office Banned Staff From Using Generative AI - The US Patent and Trademark Office has restricted the use of generative AI due to security and reliability concerns, allowing its use only within a controlled testing environment while exploring AI's potential through approved programs and partnerships.
Analysis
The importance of diminishing returns - Advancements in large language models like ChatGPT may face diminishing returns from additional data exposure, potentially slowing progress towards super general intelligence, though alternative methods could overcome these limitations.
Are A.I. Clones the Future of Dating? I Tried Them for Myself. - Exploring the concept of A.I. clones in dating, the author experiments with this emerging technology to see if it can enhance their romantic life.