Runhouse

Runhouse

1,694 followers

6d

Join us at the Ray summit in October!

Robert Nishihara

Co-founder and CEO at Anyscale (We are hiring!)

6d

I’m proud to announce a few of our Ray Summit 2024 keynote speakers! Ray Summit is the largest gathering of open source AI infrastructure leaders, and it's my favorite opportunity to connect with the Ray community in person. I’m particularly looking forward to hearing from OpenAI CTO, Mira Murati, and Runway CTO, Anastasis Germanidis. There will be over 60 breakout sessions led by dozens of technical speakers, you’ll hear about use cases at 🌃 Tech giants like Meta, NVIDIA, Google, Apple, Amazon Web Services (AWS). 🎆 Leading tech companies like Coinbase, Canva, Pinterest, ByteDance, Spotify, Airbnb, Uber, Databricks, Instacart. 🦋 Industry leaders like Bridgewater Associates, Ford Motor Company, Lockheed Martin, Point72, Recursion. 🌠 Startups like Hinge, Boston Dynamics, Rad AI, Genmo, LangChain, LlamaIndex, Astronomer, LanceDB, Motional, Zoox, City Storage Systems, Runhouse. 🌏 Many many more. Meet me in San Francisco from September 30th to October 2nd. See you there!

1 Comment

Runhouse reposted this

Matt Kandler

Engineer Runhouse, Founder Happyfeed

4w Edited

If you plan to send any private data to an LLM for business applications, you’re likely reluctant to use ChatGPT and other proprietary models. Maybe you’ve thought about self-hosting an open model like Meta’s Llama 3. But what’s the best way to actually do that? With Runhouse, you can easily deploy the model to GPUs on your own private cloud (AWS, GCP, etc) - which you can then interact with in Python code, curl commands, or even as a public endpoint. At Runhouse, we’ve written a few runnable example scripts to self-host Llama 3. In a single blog post, I’ve consolidated everything you need to know about Llama 3, why it’s worth considering, and how to use it on any cloud: https://lnkd.in/epVz4vCn. Runhouse removes the pain of managing infrastructure so you can focus on the fun part - building interesting AI applications. You don’t have to know a thing about PyTorch, CUDA, or even navigating the AWS Console. 🫠 (All this is coming from a frontend engineer) #Llama3 #LLMs #SelfHosted

How to Use Llama 3 Self-Hosted

run.house

2 Comments

Runhouse reposted this

Elijah ben Izzy

Co-creator of Hamilton; Co-founder @ DAGWorks (YC W23, StartX S23)

1mo

Do you always need an orchestrator? The #MachineLearning and #AI world has always depended on #orchestrators. Widely used as an all-in-one solution, they manage everything from packaging code and provisioning infrastructure to running scheduled jobs. Recently, people are revisiting this approach. The trend is moving away from all-in-one orchestrators to specialized tools that excel in individual aspects of the workflow. This "unbundling" of orchestration is creating opportunities for more targeted management. You can break the stack into three layers: 1. Scheduling layer – run tasks/handle jobs 2. Asset layer – manage your data + associated code 3. Infrastructure layer – provision compute (including expensive #GPUs) The #AssetLayer consists of a host of exciting new tools that recently cropped up, including #Hamilton, dltHub, dbt Labs, and #sqlmesh, that all offer a better ability to link code to data. The #InfrastructureLayer has been enhanced with tools like Runhouse, Modal Labs, RunPod, and various cloud ML platforms that unlock new resources and save costs. At DAGWorks, we are enthusiastic about this shift towards unbundling. While traditional, comprehensive orchestrators have their place (and are great pieces of software!), #AssetLayer and #InfrastructureLayer tooling can augment your #SchedulingLayer, giving you the best in class data system! It turns out Donny Greenberg and I think the same about this. While we've been hard at work on the #AssetLayer with #Hamilton, Runhouse is enhancing the flexibility of the infrastructure layer. Surprisingly, all it takes to integrate these systems is a simple scheduler like GitHub actions (!). In this post, we revisit everything from first principles and create a lightweight, flexible, and extremely powerful system that you can get started with today. https://lnkd.in/gzrK6_xW

Lean Data Automation: A Principal Components Approach

blog.dagworks.io

1 Comment

Runhouse

1,694 followers

1mo

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

1mo

💔 Ripping open the "Orchestrator" For a long time I cringed when people called Runhouse an orchestrator. It was built to solve development and execution across ML infra *without* an orchestrator, so you get the iteration, debugging, and flexibility of local Python back. I've repeatedly insisted that it should serve as an independent interface into the infra and deployed to production *within* an orchestrator or serving container. Many orchestrator maintainers I spoke with agreed, and said they'd rather focus on scheduling, fault-tolerance, and asset lineage features than maintain the infra interface! The truth is though that orchestrators do many things, and many people associate orchestrators with execution as much as they do with scheduling or assets (much to the detriment of their ML devX). This is the problem - by bundling such a diverse set of activities under the term "orchestrator," we're not helping our users pick the right tool for the job and compose a lean stack of dedicated tools. We've teamed up with our friends at DAGWorks Inc. (Elijah ben Izzy, Stefan Krawczyk), maintainers of Hamilton and Burr, to break orchestration into its principle components and show why its advantageous to reason about them separately. We roughly map out the ecosystem based on these pieces, and show how you can compose a lean and scalable data stack by focusing on using the right tool for the job. Read on here: https://lnkd.in/edM6anUn Many people find it easiest to understand Runhouse as a way to orchestrate Python programs across clusters, regions, or clouds. That's ok! Call it what you like.

Lean Data Automation: A Principal Components Approach

run.house

Runhouse

1,694 followers

1mo

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

1mo Edited

You would *think* having access to a cloud account or big cluster would make programming fun. You would think that your "hyperscaler" would make parallelizing a simple operation easy - that should be the "Hello world" of having a cloud account! You'd be wrong, until now. We've been working since late 2022 with the incomparable SkyPilot team at University of California, Berkeley to build a programming experience which makes cloud compute fun, flexible, and easy. With Runhouse+SkyPilot, parallelizing a function is trivial, but so is any other kind of scaling, architecture, or cost-optimization you could dream up. We were inspired by the fact that PyTorch made the "programming language" of your ML model just regular Python, so there was no limited linguistic subset for ML, you could do anything Python could do. We've done the same for compute infrastructure - your cloud account and existing compute are now programmable in Python like one big distributed computer. There's no prior setup needed or "border" to where your code can go - just run the Python and compute anywhere! We think this is a really big deal for cloud and ML development, and will be publishing more about it. Check out our first post on it here:

run.house

Runhouse

1,694 followers

2mo

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

2mo

If you've ever been the one person who cared *way* too much about: 1) Horrific infra busywork and DevX dragging the ML teams, 2) A complete gap in tooling for AI platform teams to manage their own infra, 3) Rampant duplication of ML services and resources everywhere, then you won't be able to un-see Runhouse. Come work with us!

Software Engineer

Runhouse, New York, NY

Runhouse

1,694 followers

2mo

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

2mo

Early on in Runhouse, the creator of a major ML platform told me he’s not sure if it’s possible for one DevX to natively support both workflow/training pipelines (long running, throughput-bound) and serving/inference pipelines (fast, latency-bound). We took that personally. We’re now rolling out the second half of that story - turning your Python functions and classes into fast, sharable services. There are two glaring gaps in most ML platforms which instant serving endpoints (inference or otherwise) fill, causing months of drag on ML teams and millions in wasted compute: 1. No UAT staging (“It’s either a notebook or containerized for prod”) - We heard numerous horror stories of teams waiting 3-6 months to productionize new work into an endpoint, only to hear from the client team that the methods or data are off and need to go back to the drawing board. 2. Nothing is shared - It’s common for 5 researchers and 5 pipelines to all be running the same activity (batch inference, preprocessing, training, eval, etc.) on separately allocated compute (e.g. one-off pods or Ray clusters) with varying code versions. The compute and time saving opportunities are massive. Even common utilities like PII obfuscation or content moderation are rarely shared as services. This “LudicrouslyFastAPI” feature has been one of our most widely requested. As promised, these aren’t new APIs or primitives, we simply took the thing we do well - sending functions and classes to remote infra - and exposed endpoints, added rich middleware, and 10x improved our serving latency and throughput. Just throw a function or class at your infra and get back a FastAPI endpoint with HTTPS, a reverse-proxy (via Caddy), access control, versioning, usage tracking, and more. We’re not trying to replace dedicated ML inference systems or FastAPI. They’re great at what they do, and we don’t do things like ORMs, model-specific optimizations, or complex routes. Instead, we want to lower the barrier to sharing Python services to zero, so: 1. You can generate endpoints for clients to UAT test your work in minutes, not months 2. Sharing internal ML apps or tools within a company is easy, like a Google Doc. Right now internal apps are built through traditional DevOps, like if every document you shared internally had to be a Next.js app 3. When you implement some often-repeated ML activity, it’s a no-brainer to share it with your team as a reusable (updatable) service Of course I must mention that Anyscale Ray is a crucial piece of our approach. In a funny way, Ray clusters are a midpoint between a Kubernetes cluster and a Spark cluster, allowing us to elegantly flex to support both offline/batch and online/serving workloads. Look out for more on this in a subseqent post.

“LudicrouslyFastAPI:” Deploy Python Functions as APIs on your own Infrastructure with Runhouse

run.house

Runhouse

1,694 followers

2mo

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

2mo Edited

I'm absolutely riled up about this code sample. This is what a year of cooking the best AI/ML DevX gets you. Distributed bayesian hyperparameter optimization on any infra, from scratch, in readable idiomatic Python. Step 1: Send your training function *as-is* to infra (existing or fresh) and get back remote functions. The pool of replicas can all be on one VM, or a cluster, or different clusters, different regions, different clouds, doesn't matter. Wherever you have ML compute. Step 2: Generate your hyperparam candidates and call them on your pool of remote functions. All idiomatic Python. Step 3: Run, iterate, and debug. Feel the freedom of running local Python with fluent access to your powerful infra, no sandbox. Adding features like early stopping or alternate search algorithms from OSS are easy. No need to wait for your e2e ML platform provider to shim them into some config flag in their HPO point-solution. Step 4: Deploy. This is regular Python - it can run in Kubernetes or Prefect/Airflow/etc. as is. No special deployment system or resource allocator or CLI magic incantations. Step 5: Iterate again. Take this production code and run it again locally to debug or improve. 🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠🏃♀️🏠 Code: https://lnkd.in/e6XvyYUW 1 star == 1 vote for an enterprise buyer to pay us to keep building the devX ML teams deserve

Runhouse reposted this

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

3mo Edited

📢🦙Try Llama3 on AWS EC2, right now!📢🦙 Llama3 here! But everyone wants you to use their hosted compute or model endpoint to try it. I just want regular Python on a regular box. Runhouse makes it so easy to do cloud AI development on your own compute with regular DSL-free Python, it took us 10 minutes to create a foolproof Llama3 example on EC2. You can run it like this: 1. First, sign the license on Hugging Face (https://lnkd.in/ePPrZgiz) and make sure your HF token is saved locally with `huggingface-cli login`. 2. Run this script to stand up an A10G, set up the environment, download the model, and call it: https://lnkd.in/euz-sG6H Github: https://lnkd.in/eeuEYVha

Joseph Spisak

Product Director & Head of Generative AI Open Source @Meta | Ex: Google, Amazon

3mo Edited

The beginning is finally here!! Welcome Llama 3 - the first two models of the next generation of Llama are now available for broad use. This was an absolutely amazing effort by so very many (as you’ll see from the model card) and such a labor of love for all of us. These models are state of the art, period. And even outshine some of the top closed models. But amazingly they are still early releases as we have a lot more to share! A few details: Models: Pretrained and instruction-fine-tuned language models with 8B and 70B parameters Purple Llama: New trust and safety tools with Llama Guard 2, Code Shield, and CyberSec Eval 2. Partners: Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm. Torchtune support: We’ve codesigned these models for a pure PyTorch fine tuning experience - https://lnkd.in/d5sbfHrS In the coming months, we expect to introduce new capabilities, longer context windows, additional model sizes, and enhanced performance, and we’ll share the Llama 3 research paper. Check out the blog here: https://lnkd.in/dEzEmrgy Pointers: Llama 3 webpage: https://lnkd.in/dKpKupBW Getting started guide: https://lnkd.in/dHbnVk7Q Downloads page: https://lnkd.in/dA9Gr4gS Llama 3 GitHub repo: https://lnkd.in/d8gkkzCy Purple Llama GitHub repo: https://lnkd.in/gTszcg3J Llama 3 github recipes: https://lnkd.in/dyPGHGMk Model card: https://lnkd.in/daKCS3DC Cybersec Eval 2 paper: https://lnkd.in/dyyZEt9X Btw, I’ll be speaking at Fully Connected later today in SF. Come by and we can chat about Llama!! https://lnkd.in/dviaWTq5 Cheers!! P.S. A BIG thank you to the Hugging Face team for working with us day and night (and then another day and night) to get these models ready and available for the community. Every timezone was accounted for! Specifically a shout out to: Omar Sanseviero, Philipp Schmid, Pedro Cuenca, Leandro von Werra, Younes Belkada, Clémentine Fourrier, Nathan Habib, Olivier Dehaene, Nicolas Patry, Arthur Zucker, Lysandre Debut, Nathan Sarrazin, Victor Mustar, Kevin Cathaly, Yuvraj Sharma, Xenova and Vaibhav Srivastav, Brigitte Tousignant, Florent Daudens, Morgan Funtowicz and Simon Brandeis - cheers to you all for the great partnership!

1 Comment

Runhouse

1,694 followers

4mo

Weekend project: Struggling with GPU scarcity? Amazon Web Services (AWS) Neuron chips are widely available and offer a compelling price tag. Now it's easier than ever to try them out, because we're publishing a series of 0->1 tutorials which set up the hardware for you automatically! These simple scripts will launch an Inferentia2 VM on EC2, install dependencies, and deploy a pre-compiled model. The first is SDXL! Tutorial here: https://lnkd.in/eViNzFxQ Code here: https://lnkd.in/eYd6d4mD As always, please reach out on Discord, send a DM, or raise a Github issue if you have questions or want to talk through your use-case.

Runhouse

Technology, Information and Internet

About us

Employees at Runhouse

Sahir Azam

Chief Product Officer at MongoDB; Board Member; Investor

Matt Kandler

Engineer Runhouse, Founder Happyfeed

Donny Greenberg

CEO at 🏃♀️Runhouse🏠, x-PyTorch

Josh Lewittes

CTO at 🏃♀️Runhouse🏠

Updates

How to Use Llama 3 Self-Hosted

run.house

Lean Data Automation: A Principal Components Approach

blog.dagworks.io

Lean Data Automation: A Principal Components Approach

run.house

Runhouse

run.house

Software Engineer

Runhouse, New York, NY

“LudicrouslyFastAPI:” Deploy Python Functions as APIs on your own Infrastructure with Runhouse

run.house

Deploy Stable Diffusion XL 1.0 on AWS Inferentia | Runhouse

run.house

Join now to see what you are missing

Similar pages

Pydantic

Metronome

CTRL-labs

Chainguard

Hugging Face

Codeium

Gloat

Observe, Inc.

Viam

Articul8 AI