In the 50th session of #MultimodalWeekly, we have two exciting presentations from startup founders building real-world products for Multimodal AI applications. ✅ Jesse N. Clark, the Co-Founder and CTO of Marqo AI, will discuss generalized contrastive learning for multimodal retrieval and ranking. They generalize the popular CLIP training method to accommodate any number of text and images when representing documents and encode relevance (or rank) to provide better first-stage retrieval. 📄 ✅ Alexandre Berkovic, the Co-Founder and CEO of Adorno AI, will dive into how video and audio understanding technologies from Twelve Labs and Adorno AI are transforming video production. 📻 Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord community: https://lnkd.in/gRt4GdDx
Twelve Labs
Software Development
San Francisco, California 5,922 followers
Help developers build programs that can see, listen, and understand the world as we do.
About us
Helping developers build programs that can see, hear, and understand the world as we do by giving them the world's most powerful video-understanding infrastructure.
- Website
-
http://www.twelvelabs.io
External link for Twelve Labs
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
555 Mission St
San Francisco, California 94105, US
Employees at Twelve Labs
Updates
-
Twelve Labs will be attending AWS Summit NY on July 10! Connect with our team to learn how you can streamline all your video-related workflows with our multimodal AI models and discuss the latest in tech. Don’t hesitate to say hello when you spot any of our team members Jae Lee Soyoung Lee Maninder Saini Andy Vaughan. We can’t wait to see everyone there! #AWSSummit #AWSNY
-
We got a new exciting collaboration with the Phyllo team to transform video insights on social media 😉 🌟 Why This Matters 🌟 With social media shifting to video, extracting insights is crucial. Video posts get up to 10 times more engagement and 74% of users take action after viewing a brand's video. 🔍The Phyllo and Twelve Labs Advantage🔍 Phyllo: - Customizable searches across 15+ social media platforms. - Cost-effective social data access. Twelve Labs: - Foundation models that analyze videos through visual, audio, and text modalities. - Offers semantic video search, zero-shot classification, video-to-text generation, and multimodal video embeddings. 🌐 Innovative Use Cases 🌐 1 - Insights for Videos: Get detailed answers, summaries, and sentiment analysis. 2 - Product Development: Analyze product usage in social videos. 3 - Byte-Sized Segments: Break long videos into short clips for Instagram and TikTok. 4 - Influencer Insights: Identify influencers using specific products and their impact. Read more about our collaboration here: https://lnkd.in/gC9Zjmgp 👀
-
~ New Webinar ~ The video recording of #MultimodalWeekly 47 with Benjamin Muller, Tu Anh NGUYEN, and Bokai Yu from AI at Meta is up! 📺 Watch here: https://lnkd.in/guZ5C_mU 👀 They discussed: - Challenges of expressive speech generation - SpiRit-LM combines TextLM and SpeechLM - SpiRit-LM training recipe and generation samples - Evaluation: zero-shot, few-shot, and text-speech sentiment-preservation benchmark - Can we observe the speech-text alignment? Join our Discord community: discord.gg/Sh6BRfakJa 🤝
SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47
https://www.youtube.com/
-
🏇 We are excited to announce the launch of Jockey: A Conversational Video Agent powered by Twelve Labs APIs and LangGraph from LangChain! Here's why developers should dive into Jockey: 👇 1 - Advanced Video Understanding: Jockey utilizes Twelve Labs' state-of-the-art video foundation models to extract rich insights from video content, offering capabilities like video search, classification, summarization, and more. 📽 2 - Flexible and Scalable Framework: Built on LangGraph, Jockey provides unparalleled control over the flow of code, prompts, and LLM calls, facilitating robust human-agent collaboration and ensuring reliable performance. ⛓ 3 - Efficient and Precise Architecture: Jockey's architecture includes key components such as the Supervisor, the Planner, and specialized Workers that handle tasks like video search, text generation, and editing, ensuring optimal token usage and accurate node responses. 🏛 4 - Customizable and Extensible: Jockey's modular design allows for easy customization and extension. Developers can modify prompts, extend state management, or add new workers to tailor Jockey to specific needs, making it a versatile foundation for advanced video AI applications. 🤟 Full blog post here: https://lnkd.in/gbudqhKM 😎
-
~ New Webinar ~ The video recording of #MultimodalWeekly 46 with Anoop Thomas from EMAM, Inc. is up! 📺 Watch here: https://lnkd.in/gZnWiYNS 👀 He discussed: - eMAM provides an end-to-end media workflow - eMAM's technology partners - eMAM's architecture and deployment options - Live demo of Twelve Labs models capabilities in eMAM product Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46
https://www.youtube.com/
-
Exciting times at Twelve Labs! Our team just returned from #CVPR2024 in Seattle last week, and what an incredible experience it was! 🌌 CVPR did not disappoint this year. We immersed ourselves in the latest advancements in video understanding and multimodal AI - areas at the core of our mission at Twelve Labs. Some highlights: 🌟 • Engaging discussions on cutting-edge research in multimodal foundation models • Insights into the latest trends in video embedding and retrieval • Connecting with brilliant minds pushing the boundaries of video-language modeling 🔬 Calling all ML Researchers! 🔬 Are you passionate about advancing the field of video understanding and multimodal AI? We're expanding our ML Research team and looking for talented individuals to join us on this exciting journey. Open Roles: • ML Research Scientist • Research Internships If you're ready to tackle challenging problems in video foundation models and multimodal LLMs, we want to hear from you! Learn more and apply: https://lnkd.in/ggc-mYa8 ◀ Aiden L. Hyojun Go Ryan Scott Kate Chen Sunny Hien Nguyen James Le Jenny Jayoung Ahn Minjoon Seo
-
In the 49th session of #MultimodalWeekly, we have two exciting presentations from researchers working in language model alignment and large multimodal models. 🎓 ✅ Jiwoo Hong, an M.S. student at KAIST AI, will discuss ORPO, a monolithic odds ratio preference optimization algorithm that eliminates the need for an additional preference alignment phase and reference model. This is a resource-efficient method for developing preference-based aligned models. ✅ Associate Professor Lei Huang and Baichuan Zhou from Beihang University, will introduce the TinyLLaVA framework, which offers a unified perspective for designing and analyzing small-scale large multimodal models. This work demonstrates that with better training recipes and higher-quality data, smaller LMMs can achieve comparable performance to larger LMMs. Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord community: https://lnkd.in/gRt4GdDx 🤝
-
~ New Webinar ~ The video recording of #MultimodalWeekly 45 with Seungone Kim and Nikhil Singh is up! 📺 Watch here: https://lnkd.in/g5J5x8EB 👀 They discussed: - Open-source evaluator LLM - Good evaluation performance - Fine-grained evaluation - Looking Similar, Sounding Different - Creative Text-to-Audio Generation via Synthesizer Programming - Contrastive Learning from Synthetic Audio Doppelgangers Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45
https://www.youtube.com/
-
Maninder Saini, our Head of Growth, recently spoke at the Sports Loft Summit in London - in which he explained how Twelve Labs uses multimodal AI to bring human-like understanding to video content. ⚽ 🇬🇧 📽 Catch his full talk here: https://lnkd.in/eJ2CEam3 He also did a podcast with Charlie Greenwood and Yanni Andreopoulos to dig deeper into the applications in the sports and media entertainment industry, particularly in helping rights holders organize and find content in their video libraries. 😎 Listen to the full episode here: https://lnkd.in/g-3Snzfp