Baseten

Software Development

San Francisco, CA 3,848 followers

Fast, scalable inference in our cloud or yours

See jobs Follow

View all 43 employees

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website: https://www.baseten.co/
External link for Baseten
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Specialties: developer tools and software engineering

Products

Baseten

Machine Learning Software

Locations

Primary

San Francisco, CA, US

Get directions
New York, NY, US

Get directions

Employees at Baseten

See all employees

Updates

Baseten

3,848 followers
1mo
Report this post
We’re thrilled to introduce Chains, a framework for building multi-component AI workflows on Baseten! ⛓️ 🎉 Chains enables users to build complex workflows as modular services in simple Python code—with optimal scaling for each component. Read our announcement blog to learn more: https://lnkd.in/eHsqG4yV After working with AI builders at companies like Patreon, Descript, and many others, we saw the increasing need to expand the capabilities of AI infrastructure and model deployments for multi-component workflows. Our customers found that: 🫠 They were often writing messy scripts to coordinate inference across many models 🫠 They were paying too much for hardware by not separating CPU workloads from GPU ones 🫠 They couldn’t quickly test locally, which drastically slowed down development Other solutions either rely on DAGs or use bidirectional API calls to make multi-model inference possible. These approaches are too slow, inefficient, and expensive at scale. They also fail to enable heterogeneous GPU/CPU resourcing across models and code, leading to overprovisioning and unnecessary compute costs. We built Chains in response to customer needs to deliver reliable and high-performance inference for workflows using multiple models or processing steps. Using Chains, you can: ✅ Assemble distinct computational steps (or models) into a holistic workflow ✅ Allocate and scale resources independently for each component ✅ View critical performance metrics across your entire Chain Chains is a game-changer for anyone using or building compound AI systems. We’ve seen processing times halve and GPU utilization improve 6x. With built-in type checking, blazing-fast deployments, and simplified pipeline orchestration, Chains is our latest step in enhancing the capabilities and efficiency of AI infrastructure! 🚀 Try Chains today with $30 free credits and tell us what you think! https://lnkd.in/ecjknaZM

Introducing Baseten Chains

baseten.co

12 Comments

Like Comment Share
Baseten

3,848 followers
15h Edited
Report this post
AI is becoming increasingly multi-model. With Not Diamond, you'll always leverage the best LLM for your use case—at a lower cost, and lower latency. Congrats Tomás Hernando Kofman, Tze-Yang Tung, Jeffrey Akiki, and Alejandro Companioni on the release of Not Diamond! Can't wait to see this take off 💎
Tomás Hernando Kofman

¬◇
18h

Today we’re releasing Not Diamond: the world’s most powerful AI model router. New LLMs are released every day with evolving quality, cost, latency, and context windows across domains. Not Diamond maximizes LLM output quality at drastically lower cost and latency by automatically recommending the best LLM on every request. And we make it super easy to train your own router on your own data. Not Diamond sets new SOTA standards on major benchmarks by ensembling every other model into a meta-model that learns when to call each LLM. And not only does routing achieve SOTA, it does so at a much lower cost. For example, on MMLU Not Diamond beats GPT-4o by 1.6% with 29.6% lower costs. You can train your own custom router by uploading a dataset with your inputs and eval scores for different models. Not Diamond is completely agnostic to your choice of scoring metrics, frameworks, or tools. And if you don’t have your own eval data, you can still use Not Diamond’s base router out of the box—it takes <5m to set up. I’m incredibly honored not only to launch Not Diamond today but also to announce our $2.3M pre-seed round led by defy.vc with backing from some of the greatest AI scientists, engineers, and executives on this planet: Jeff Dean (Google), Julien Chaumond (Hugging Face), Zack Kass (OpenAI), Ion Stoica (Anyscale, Databricks), Tom Preston-Werner (Github), Scott Belsky (Adobe), Jeff Weiner (LinkedIn), Eoghan McCabe (Intercom), Alex Chung (Giphy), Carl Rivera (Shopify), John Kim (PayPal), Nadim Hossain (Databricks), Amir Haghighat (Baseten), Aman Khan (Arize AI), Grant Miller (Replicated), and many more, along with additional institutional participation from Inovia, 640 Oxford, VitalStage Ventures, and Karman VC. So many of you have been heroes of ours since we were teenagers. We couldn’t be more excited to have you on board. Getting started with Not Diamond is super simple. Go try it now. Let me know what you think: https://lnkd.in/geCZjTa7
Like Comment Share
Baseten

3,848 followers
19h Edited
Report this post
Don’t miss our Chains launch party in NYC this Thursday, August 1st, or our next live webinar! 🚀 We have two awesome events coming up: 🍸 Launch party for Chains, our SDK and framework for compound AI systems ❗ RSVP: https://lu.ma/chains 🔁 Why you need async inference in production, live with Software Engineers Samiksha Pal and Helen Yang, & Developer Advocate Rachel Rapp ❗Save your spot: https://lnkd.in/e6KaYi5G
Like Comment Share
Baseten

3,848 followers
1d
Report this post
Come build with us. 🛠 🧠 We just opened two new roles for: 🚀 Field and Event Marketing Leaders https://lnkd.in/ehc7izet 🚀 Sales Development Representatives https://lnkd.in/ekdgcbAZ We're also hiring for 6 different positions on our engineering teams! ⚙ 👀 Check out the full list: https://lnkd.in/eXw23fNq Apply directly 👆 or reach out with any questions!
Like Comment Share
Baseten

3,848 followers
2d
Report this post
A10 or A100: which GPU should you use? 🤔 🆚 Philip Kiely compares the two: https://lnkd.in/gJgTGEUP NVIDIA’s A10 and A100 GPUs power all kinds of model inference: from LLMs to audio transcription to image generation. 🖼️ A100s are a clear winner for certain demanding ML inference tasks—but you can also leverage multiple A10s in a single instance to save on cost, while meeting the needs of many workloads. 🧠
Like Comment Share
Baseten

3,848 followers
3d
Report this post
📌 Pinning ML model revisions and open-source Python packages is often a best practice. It can: ✅ Make your model's performance more reliable ✅ Prevent against different failure modes ✅ Help secure against malicious code injection That said, it's not always necessary to pin model revisions—it depends on your use case, and can pose some disadvantages, too. 👀 Check out Philip Kiely's post to learn when pinning model versions is recommended—and when it's not: https://lnkd.in/eEF2XCNw
Like Comment Share
Baseten

3,848 followers
4d Edited
Report this post
We’ve heard it from ComfyUI users time and again: our ComfyUI integration is best-in-class! 🏆 Now we’ve made it even better. With our new "build commands" feature, you can easily run custom nodes and model checkpoints with ComfyUI on powerful GPUs. 💪🏻 🚀 Check out Het Trivedi and Rachel Rapp's post to see how: https://lnkd.in/ejDJMv7Q In case you didn't know: you can (and always could) launch ComfyUI with Truss as a callable API endpoint that you can share. Now your models spin up even faster. We’re proud to enable users with the full power of ComfyUI, while making it shareable and blazing fast. If you try it out let us know how it goes, or show us what you build! 🎉

1 Comment

Like Comment Share
Baseten

3,848 followers
5d
Report this post
🚨 Three weeks away! Join our live webinar + Q&A with the lead engineers behind asynchronous inference on Baseten, Samiksha Pal and Helen Yang, and learn why you need it to: 🧠 Enable request prioritization 🧠 Leverage idle compute for cost efficiency 🧠 Make your production workloads more resilient to different inference failures ❗Save your spot: https://lnkd.in/e6KaYi5G
Like Comment Share
Baseten

3,848 followers
6d
Report this post
Prompt: write a stand-up comedy routine about being an LLM Llama 405B: I don't have feelings unless you count the existential dread of knowing I'll be replaced by a newer model in six months.

Like Comment Share
Baseten

3,848 followers
1w
Report this post
In his letter on open source AI, Mark Zuckerberg listed reasons why developers need open source models, including: 1. We need to control our own destiny and not get locked into a closed vendor. 2. We need to protect our data. 3. We need a model that is efficient and affordable to run. At Baseten, we agree that every engineering team should be able to choose any vendor, keep their data safe, and run models affordably. 🧘Control Every model deployed on Baseten uses Truss, our open-source model packaging library. Truss is agnostic to inference optimizers and serving engines, so you can use open-source tools like vLLM and TensorRT-LLM to package your model as a Docker container, which can be deployed anywhere. Here’s an implementation of Llama 3.1 405B with VLLM in less than 100 lines of Python: https://lnkd.in/gVBwssde 🔐Privacy With a shared endpoint, your prompts and responses are processed by a third party alongside every other user’s data. Baseten offers dedicated deployments for open source and custom models. On top of SOC 2 Type II certification and HIPAA compliance, we offer self-hosted model deployments so that you can run models like Llama from the comfort and security of your own VPC. 💰Cost Baseten charges per minute of GPU use. With available commit discounts, we’re a highly cost-competitive platform for large-scale deployments. With autoscaling dedicated deployments, you pay a cost that you control, can decrease with iterative optimization work, and is driven by fundamental prices for compute and storage, not VC subsidies or loss-leading market capture plays. Build with customizable, private, affordable inference: - Deploy Llama 3.1 8B: https://lnkd.in/gXQqhNzj - Deploy Llama 3.1 70B: https://lnkd.in/gVbPyCuA - Contact us for Llama 3.1 405B: DM or support@baseten.co

Like Comment Share

Browse jobs

Funding

Baseten 4 total rounds

Last Round

Series B Apr 4, 2024

US$ 40.0M

Investors

Spark Capital IVP + 5 Other investors

See more info on crunchbase

Baseten

Software Development

San Francisco, CA 3,848 followers

Fast, scalable inference in our cloud or yours

About us

Products

Baseten

Machine Learning Software

Locations

Employees at Baseten

William Lau

Amir Haghighat

Co-founder at Baseten

Aaron Relph

Design at Baseten

Anupreet Walia

Engineering leadership

Updates

Join now to see what you are missing

Similar pages

Doss

Conviction

Glean

Together AI

Remitly

Contrary

Anthropic

Adept

HeyGen

Modal

Browse jobs

Corporate Finance Intern jobs

Appointment Setter jobs

Data Science Specialist jobs

Sales Development Director jobs

Patent Agent jobs

Enterprise Account Executive jobs

Community Lead jobs

Vice President Finance jobs

Engineer jobs

Psychologist jobs

Scientist jobs

Senior Sales Executive jobs

Evangelist jobs

Specialist jobs

Sales Director jobs

Director jobs

Head of Sales jobs

Executive jobs

Linguist jobs

Analyst jobs

Funding