Baseten

Baseten

Software Development

San Francisco, CA 3,848 followers

Fast, scalable inference in our cloud or yours

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website
https://www.baseten.co/
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools and software engineering

Products

Locations

Employees at Baseten

Updates

  • View organization page for Baseten, graphic

    3,848 followers

    We’re thrilled to introduce Chains, a framework for building multi-component AI workflows on Baseten! ⛓️ 🎉 Chains enables users to build complex workflows as modular services in simple Python code—with optimal scaling for each component. Read our announcement blog to learn more: https://lnkd.in/eHsqG4yV After working with AI builders at companies like Patreon, Descript, and many others, we saw the increasing need to expand the capabilities of AI infrastructure and model deployments for multi-component workflows. Our customers found that: 🫠 They were often writing messy scripts to coordinate inference across many models 🫠 They were paying too much for hardware by not separating CPU workloads from GPU ones 🫠 They couldn’t quickly test locally, which drastically slowed down development Other solutions either rely on DAGs or use bidirectional API calls to make multi-model inference possible. These approaches are too slow, inefficient, and expensive at scale. They also fail to enable heterogeneous GPU/CPU resourcing across models and code, leading to overprovisioning and unnecessary compute costs. We built Chains in response to customer needs to deliver reliable and high-performance inference for workflows using multiple models or processing steps. Using Chains, you can: ✅ Assemble distinct computational steps (or models) into a holistic workflow ✅ Allocate and scale resources independently for each component ✅ View critical performance metrics across your entire Chain Chains is a game-changer for anyone using or building compound AI systems. We’ve seen processing times halve and GPU utilization improve 6x. With built-in type checking, blazing-fast deployments, and simplified pipeline orchestration, Chains is our latest step in enhancing the capabilities and efficiency of AI infrastructure! 🚀 Try Chains today with $30 free credits and tell us what you think! https://lnkd.in/ecjknaZM

    Introducing Baseten Chains

    Introducing Baseten Chains

    baseten.co

  • View organization page for Baseten, graphic

    3,848 followers

    AI is becoming increasingly multi-model. With Not Diamond, you'll always leverage the best LLM for your use case—at a lower cost, and lower latency. Congrats Tomás Hernando Kofman, Tze-Yang Tung, Jeffrey Akiki, and Alejandro Companioni on the release of Not Diamond! Can't wait to see this take off 💎

    Today we’re releasing Not Diamond: the world’s most powerful AI model router. New LLMs are released every day with evolving quality, cost, latency, and context windows across domains. Not Diamond maximizes LLM output quality at drastically lower cost and latency by automatically recommending the best LLM on every request. And we make it super easy to train your own router on your own data. Not Diamond sets new SOTA standards on major benchmarks by ensembling every other model into a meta-model that learns when to call each LLM. And not only does routing achieve SOTA, it does so at a much lower cost. For example, on MMLU Not Diamond beats GPT-4o by 1.6% with 29.6% lower costs. You can train your own custom router by uploading a dataset with your inputs and eval scores for different models. Not Diamond is completely agnostic to your choice of scoring metrics, frameworks, or tools. And if you don’t have your own eval data, you can still use Not Diamond’s base router out of the box—it takes <5m to set up. I’m incredibly honored not only to launch Not Diamond today but also to announce our $2.3M pre-seed round led by defy.vc with backing from some of the greatest AI scientists, engineers, and executives on this planet: Jeff Dean (Google), Julien Chaumond (Hugging Face), Zack Kass (OpenAI), Ion Stoica (Anyscale, Databricks), Tom Preston-Werner (Github), Scott Belsky (Adobe), Jeff Weiner (LinkedIn), Eoghan McCabe (Intercom), Alex Chung (Giphy), Carl Rivera (Shopify), John Kim (PayPal), Nadim Hossain (Databricks), Amir Haghighat (Baseten), Aman Khan (Arize AI), Grant Miller (Replicated), and many more, along with additional institutional participation from Inovia, 640 Oxford, VitalStage Ventures, and Karman VC. So many of you have been heroes of ours since we were teenagers. We couldn’t be more excited to have you on board. Getting started with Not Diamond is super simple. Go try it now. Let me know what you think: https://lnkd.in/geCZjTa7

    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,848 followers

    Don’t miss our Chains launch party in NYC this Thursday, August 1st, or our next live webinar! 🚀 We have two awesome events coming up: 🍸 Launch party for Chains, our SDK and framework for compound AI systems ❗ RSVP: https://lu.ma/chains 🔁 Why you need async inference in production, live with Software Engineers Samiksha Pal and Helen Yang, & Developer Advocate Rachel Rapp ❗Save your spot: https://lnkd.in/e6KaYi5G

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,848 followers

    A10 or A100: which GPU should you use? 🤔 🆚 Philip Kiely compares the two: https://lnkd.in/gJgTGEUP NVIDIA’s A10 and A100 GPUs power all kinds of model inference: from LLMs to audio transcription to image generation. 🖼️ A100s are a clear winner for certain demanding ML inference tasks—but you can also leverage multiple A10s in a single instance to save on cost, while meeting the needs of many workloads. 🧠

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,848 followers

    📌 Pinning ML model revisions and open-source Python packages is often a best practice. It can: ✅ Make your model's performance more reliable ✅ Prevent against different failure modes ✅ Help secure against malicious code injection That said, it's not always necessary to pin model revisions—it depends on your use case, and can pose some disadvantages, too. 👀 Check out Philip Kiely's post to learn when pinning model versions is recommended—and when it's not: https://lnkd.in/eEF2XCNw

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,848 followers

    We’ve heard it from ComfyUI users time and again: our ComfyUI integration is best-in-class! 🏆 Now we’ve made it even better. With our new "build commands" feature, you can easily run custom nodes and model checkpoints with ComfyUI on powerful GPUs. 💪🏻 🚀 Check out Het Trivedi and Rachel Rapp's post to see how: https://lnkd.in/ejDJMv7Q In case you didn't know: you can (and always could) launch ComfyUI with Truss as a callable API endpoint that you can share. Now your models spin up even faster. We’re proud to enable users with the full power of ComfyUI, while making it shareable and blazing fast. If you try it out let us know how it goes, or show us what you build! 🎉

  • View organization page for Baseten, graphic

    3,848 followers

    Prompt: write a stand-up comedy routine about being an LLM Llama 405B: I don't have feelings unless you count the existential dread of knowing I'll be replaced by a newer model in six months.

  • View organization page for Baseten, graphic

    3,848 followers

    In his letter on open source AI, Mark Zuckerberg listed reasons why developers need open source models, including: 1. We need to control our own destiny and not get locked into a closed vendor. 2. We need to protect our data. 3. We need a model that is efficient and affordable to run. At Baseten, we agree that every engineering team should be able to choose any vendor, keep their data safe, and run models affordably. 🧘Control Every model deployed on Baseten uses Truss, our open-source model packaging library. Truss is agnostic to inference optimizers and serving engines, so you can use open-source tools like vLLM and TensorRT-LLM to package your model as a Docker container, which can be deployed anywhere. Here’s an implementation of Llama 3.1 405B with VLLM in less than 100 lines of Python: https://lnkd.in/gVBwssde 🔐Privacy With a shared endpoint, your prompts and responses are processed by a third party alongside every other user’s data. Baseten offers dedicated deployments for open source and custom models. On top of SOC 2 Type II certification and HIPAA compliance, we offer self-hosted model deployments so that you can run models like Llama from the comfort and security of your own VPC. 💰Cost Baseten charges per minute of GPU use. With available commit discounts, we’re a highly cost-competitive platform for large-scale deployments. With autoscaling dedicated deployments, you pay a cost that you control, can decrease with iterative optimization work, and is driven by fundamental prices for compute and storage, not VC subsidies or loss-leading market capture plays. Build with customizable, private, affordable inference: - Deploy Llama 3.1 8B: https://lnkd.in/gXQqhNzj  - Deploy Llama 3.1 70B: https://lnkd.in/gVbPyCuA  - Contact us for Llama 3.1 405B: DM or support@baseten.co

Similar pages

Browse jobs

Funding