We’ve recently contributed FP8 support to vLLM in collaboration with Neural Magic -- with this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! A common concern with FP8 is whether users will experience accuracy degradation. To address this, Neural Magic has produced many checkpoints for key models with >99% accuracy preservation across a wide range of benchmarks (https://lnkd.in/gTimN5dZ), including: - Llama3-70b - Mixtral 8x7b - Llama3-8b You can easily try this out on vLLM, and read more about the feature here -- https://lnkd.in/gzKJqerB
Anyscale
Software Development
San Francisco, California 24,130 followers
Scalable compute for AI and Python
About us
Scalable compute for AI and Python Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center.
- Website
-
https://anyscale.com
External link for Anyscale
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2019
Locations
-
Primary
San Francisco, California 94105, US
Employees at Anyscale
Updates
-
🌟 It's not too late! 🌟 Join us tonight, July 9th, from 5:30 - 8:30pm in NYC for the Ray in Financial Services Meetup. Don’t miss out on discovering how Ray powers AI in finance and networking with industry peers Sign up here: https://lu.ma/vrlylh28 We hope to see you there!
-
🚨Last chance!🚨 Tomorrow, July 9th, 5:30 - 8:30pm in NYC, join the Ray in Financial Services Meetup. Discover how Ray powers AI in finance, explore AI-driven use cases, get insights into the Ray 3.0 roadmap, and network with industry peers🤝 Minimal spots left, sign up here: https://lu.ma/vrlylh28
Ray in Financial Services · Luma
lu.ma
-
Harrison Chase will be speaking at this year's #RaySummit! 🙌 Learn more about building LLM applications including agentic applications from the cofounder of LangChain, the open-source framework and toolkit helping developers innovate. 🚀 Register Today: https://lnkd.in/gmpMH_Wi
Anyscale | Ray Summit 2024
raysummit.anyscale.com
-
Join us next Tuesday, July 9th, 5:30 - 8:30pm for the Ray in Financial Services Meetup in NYC! Hear from the Anyscale team and experts about how Ray powers AI in financial services. Dive deep into Ray, learn about AI-driven data extraction, high-powered trading use cases, and get insights into the Ray 3.0 roadmap. Network with industry peers and enjoy great discussions over free food and drinks. Space is limited, so sign up today! Details and registration: https://lu.ma/vrlylh28
Ray in Financial Services · Luma
lu.ma
-
Introducing RouteLLM: a sophisticated routing framework developed with Berkeley-LMSys in collaboration with Anyscale. RouteLLM optimizes query handling by dynamically selecting between high-performance proprietary LLMs and cost-effective open-source models, cutting costs by over 2x without sacrificing quality. Using human preference data and LLM-as-a-judge for data augmentation, our routers evaluate query complexity to choose the appropriate model. Rigorous testing on benchmarks like MMLU and GSM8K confirms our cost-efficient, high-quality performance. Explore our open-source code, models, and preference data on GitHub: https://lnkd.in/gJyijZRx, and try our online demo: https://lnkd.in/gME-GC2a Explore More about RouteLLM and learn how it can revolutionize your LLM applications: Read our Blog here: https://lnkd.in/gfPD-u-y LLMsys Blog here: https://lnkd.in/ga_MgERE Full research paper here: https://lnkd.in/gqRy7Pjy
GitHub - anyscale/llm-router: Tutorial for building LLM router
github.com
-
Speaking of which, check out Ray 2.31.0 😇 A big recent focus has been on 🎇 Ray core reliability / stability at scale. We've literally burned through 350+ core usability / reliability issues in the past months. Incredibly hard but important work. 🔎 Observability tooling: not one silver bullet, just constantly chipping away across the board to make it better and better.
We recently moved to weekly Ray releases to ship features to our community faster. 🚀 Doing so required us to fix flaky tests and completely revamp our release process. 👊 Read more here: https://lnkd.in/gUejngmD
-
Anyscale reposted this
We recently moved to weekly Ray releases to ship features to our community faster. 🚀 Doing so required us to fix flaky tests and completely revamp our release process. 👊 Read more here: https://lnkd.in/gUejngmD
-
We recently moved to weekly Ray releases to ship features to our community faster. 🚀 Doing so required us to fix flaky tests and completely revamp our release process. 👊 Read more here: https://lnkd.in/gUejngmD
-
Join us this Wednesday June 26th for our Fast and Scalable Training webinar series focused on Model Training with PyTorch and Ray. You’ll learn how to migrate your code from pure PyTorch to Ray Train and Ray Data, enabling scalable and efficient AI workflows Register Here: https://lnkd.in/gmGqUjWw
-