Doubao-PixelDance Vs Gemini-1.5-Pro-002

Hey friends! Today’s AI world is packed with developments in supercomputer development, nuclear control, apple intelligence and many more. Stay tuned for all these updates and more global AI happenings. Let’s dive in and enjoy the AI ride!

Welcome back to Superintelligence AI

Hey friends! Today's top AI news highlights include release of new AI models by Google, and Byte Dance, and tools by OpenAI and Alibaba. Additionally, Sri Lanka unveiled its AI strategy. Let’s dive in—enjoy this AI ride in just 4 minutes!

The AI World Today

  • Gemini Models: Faster and Cheaper

  • OpenAI Expands Voice Mode Feature

  • ByteDance Unveils New AI Models

  • Alibaba Unveils AI Translation Model

  • Spotify Expands AI Playlist Tool

  • Sri Lanka’s AI Strategy Unveiled

    +

  • Heads Up

Updated Gemini Models: Lower Costs, Higher Performance

Source: Google

Google has released updated production-ready Gemini models, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, with major improvements. Key updates include a 50% price reduction for 1.5 Pro (input/output for prompts under 128K tokens), 2x faster output, 3x lower latency, and higher rate limits (2000 RPM for Flash, 1000 RPM for Pro). The models show significant gains in math, vision, and code performance, with improved response conciseness and helpfulness. Enhanced safety filter settings are now optional. Developers can access the models via Google AI Studio and the Gemini API. The 1.5 Pro model also has a 64% price cut for input tokens, effective October 1, 2024. An improved 1.5-Flash-8B-Exp-0924 model is also available.

Advanced Voice Mode Rolls Out with Updates

Source: Letta

OpenAI is rolling out Advanced Voice Mode (AVM) to ChatGPT’s Plus and Teams customers, with Enterprise and Edu tiers gaining access next week. AVM features a redesigned blue animated sphere and five new voices: Arbor, Maple, Sol, Spruce, and Vale, bringing the total to nine. Notably, Sky, a voice removed due to legal issues, is absent. AVM also improves accent recognition, smoother conversations, and adds Custom Instructions and Memory features. Missing from this update are video and screen-sharing capabilities shown in a previous demo. The rollout excludes regions like the EU, U.K., and others. Users will be notified via a pop-up in the ChatGPT app when AVM becomes available.

ByteDance Launches Advanced Video-Generation AI Models

Source: Yahoo

ByteDance has unveiled two new AI video-generation models, Doubao-PixelDance and Doubao-Seaweed, to compete with OpenAI's Sora. Doubao-PixelDance creates 10-second videos, while Doubao-Seaweed generates clips up to 30 seconds, using text and image prompts. The models are part of the Doubao AI family, leveraging ByteDance’s experience from Douyin (TikTok’s Chinese version) and CapCut. They excel in producing stable, real-life and fictional scenes. ByteDance also introduced a music generator, a simultaneous interpreter, and upgraded the Doubao Pro model to handle super-long text prompts (up to 256,000 tokens). These moves reflect China’s push to close the gap with U.S. AI advancements. The Doubao LLM family has seen a 10-fold increase in usage since May.

 

Real-Time Speech Translation with Alibaba’s Gummy

Screenshot: YouTube

Alibaba has unveiled Gummy, a cutting-edge speech translation model at Yunqi 2024. Gummy supports real-time streaming in over 10 languages, including Chinese, English, Cantonese, Japanese, Korean, French, German, Russian, Italian, and Spanish. Unlike traditional systems, Gummy translates speech directly without converting it into text first, offering <0.5s latency and superior accuracy. This breakthrough outperforms traditional cascading systems in both speed and quality. Gummy is now available for testing on the Tongyi APP, with plans to integrate it into a new speech model by November. The timeline and potential open-source release are pending organizational decisions, with updates to follow.

Spotify's AI Playlist Feature Expands Internationally

Screenshot: Spotify

Spotify has expanded its AI Playlist tool to the U.S., Canada, Ireland, and New Zealand after launching in the U.K. and Australia. Currently in beta for Android and iOS, the feature allows users to create personalized playlists based on written prompts, such as “a romantic playlist for date night.” Users can refine playlists by adding more prompts, and Spotify offers suggestions for further customization. The tool supports mixing genres, moods, artists, and decades, and even allows personalization with locations, animals, and emojis. Available in the “Your Library” tab, users can create and save playlists with the “AI Playlist” option. Spotify reports strong success in early markets, with millions of playlists created by Premium subscribers.

Sri Lanka Releases National AI Governance Strategy

Screenshot: Sri Lanka Strategy Paper

Sri Lanka has published its National Strategy on AI, guided by seven core principles: inclusivity, responsibility, trustworthiness, transparency, human-centricity, agile governance, collaboration, and sustainability. These principles ensure AI development aligns with national goals while safeguarding citizen rights. The strategy emphasizes AI literacy, encouraging all citizens to engage with AI and understand its benefits, challenges, and risks. By promoting public understanding, Sri Lanka aims to foster trust and unlock AI’s potential for inclusive growth, aligned with the UN's Sustainable Development Goals (SDGs). Additionally, the strategy highlights the need for ethical AI development, balancing governance with responsible practices like capacity-building and transparent deployment to ensure AI systems prioritize fairness, accountability, and well-being.

Heads Up

Nvidia unveiled the Llama-3.1-Nemotron-51B, a fine-tuned LLM optimizing performance and efficiency, designed to fit on a single H100 GPU, reducing memory and costs.

OpenAI CEO Sam Altman confirmed that his hardware project with ex-Apple designer Jony Ive is not a phone, dismissing the idea of improving existing phones.

Microsoft unveiled "Correction," a service to revise factually incorrect AI-generated text by fact-checking against sources, available in preview via Azure AI Content Safety API.

Snap expanded its partnership with Google Cloud, using Gemini AI to enhance Snapchat’s My AI chatbot with multimodal capabilities for understanding text, audio, images, and videos.

Ooredoo Group signed a QAR 2 billion ($549.4 million) financing deal to expand its data center business, marking Qatar’s largest tech sector transaction by value and duration.

Indian startup Qure.AI raised $65 million from Lightspeed Venture Partners and 360 ONE Asset Management to expand and enhance its AI-driven disease detection products.

Vietnam plans to establish one semiconductor fabrication plant and 10 packaging plants by 2030, alongside launching a fund to support foreign investors amid global tax changes.

AI & ChatGPT Mini Crash Course - Eliminate workplace burnout & save 16+ hours every week. Learn 20+ AI tools, prompting techniques & hacks for free.