Y Combinator’s Post

View organization page for Y Combinator, graphic

921,119 followers

2mo

Unify (YC W23) dynamically routes your prompts to the best LLM and provider, so you can easily balance cost, latency, and output quality. Just tune these three dials, and let Unify handle the rest. There is a new LLM emerging almost every week, and this means regularly testing each model against your requirements, juggling multiple accounts and API keys, and constantly updating your application to leverage the models most suited for your task. It can be pretty overwhelming. Many people give up and just try the largest models for everything. However, you really don't need GPT-4o or Opus to summarize simple documents, for example. Llama 8B is more than capable and is ~10 times faster and ~100 times cheaper. Most LLM apps are much slower and more expensive than they need to be. Founded by Daniel Lenton, Unify solves this problem by automatically routing each prompt to the best model based on your own preferences for quality, speed, and cost. Your "easy" prompts will go to the fastest and cheapest models, and only the "hard" prompts will go to the heavy lifters like GPT4. You focus on building your LLM application, and it'll focus on providing the best models, with the fastest providers, at the lowest cost. Congrats to the team on the launch!

Launch YC: Unify - The best LLM on every prompt ✨ | Y Combinator

ycombinator.com

59 Comments

Lindsay Richman

Co-Founder, Innerverse AI | McKinsey Alum | Google for Startups | VentureBeat Top Woman in AI

2mo

Innerverse uses an orchestrator with sub-agents, so multiple models from different services. A few questions: 1) How is an easy prompt defined? Does the user define this, or does Unify have a way of testing this? We need very nuanced interpretation of our data, and only certain models can provide this. It’s possible I could get the same output with a certain data feed and system prompt, but that would require a lot of testing, because we’d be putting pressure on the native NLU. 2) Our system prompt is informed by a corpus of data. Can you use Unify after an initial prompt? Our agents are conversational, so it’s multi-turn

4 Reactions

Ravneel Pratap

Product Manager at eXp Realty

2mo

ai for the ai

4 Reactions

Srive.co

1mo

Sounds great - this will help users with efficiency and the big players with using their resources better!

1 Reaction

Kyle Collins

Creative & Frontend Developer | Crypto Content Writer & Marketer

2mo

The only problem I see with this Daniel Lenton is that in the future the best models will be optimized based on the input and will be dynamically priced. So most of the problem will be abstracted away. At that point you have to somehow screen for quality based on the prompt and route to the best LLM. In fact I could see there being bidding among AI models in the backend for the opportunity to answer the prompt once AI is commoditized. If you can be that middleware for the AI bidding wars, I think you’ve got a good long term model

1 Reaction

Sameer M

"Passionate AI Engineer Transforming Ideas into Reality: Join me on a Journey through the World of Artificial Intelligence! 🚀🤖 #AIVisionary #TechInnovation #AIEngineering"

2mo

It's looks really intresting, would you mind giving some insights on my chat bot https://www.linkedin.com/posts/sameer-m-b73376167_ai-machinelearning-langchain-activity-7198736867863724032-SWBg?utm_source=share&utm_medium=member_android

1 Reaction

Karson Clancy

Infoseek - Research and Optimization

2mo

https://www.linkedin.com/posts/karson_developer-bewarechoosing-the-wrong-llm-activity-7195924822596464640-V8on?utm_source=share&utm_medium=member_ios

1 Reaction

Mark Spangler

Leader - Omnichannel | AI | CX | Strategy & Ops Transformation @ Merck

2mo

Ecosystem optimization services already. Well done on being an early mover to support adoption and choice of LLM services. It seems connected to practicality and value for the transaction - which is wonderful in these early throws of learning and iterating on use cases.

2 Reactions

Aditya Ranjan

engineer | ml @ Google Gemini | comp sci

1mo

A long time ago YC funded companies like Reddit, AirBnb and DoorDash which disrupted the traditional way their businesses worked (a disruption necessary in the new internet era). Now it funds companies which route prompts to LLMs. The golden age of tech is truly over!

1 Reaction

Martin E.

Content Marketing | SEO | Narrative Design

2mo

the API economy at work

2 Reactions

The Babu Ruidas ⭐

Founder at Ruidas | Professional Entrepreneur | Entrepreneurship Community Member| Looking for a tech team for our upcoming startup

1mo

Hello YC i have a social media idea in my mind. I have applied in YC for twice. Please review my application and give me some grants. My name is Babu Ruidas.

See more comments

To view or add a comment, sign in

More Relevant Posts

Levan Kiladze

Founder of Lemondo | eCommerce Platforms: Quickshipper, SS.GE, Roamingo.com, TKT | MIT Sloan
2mo
Report this post
Arrived at that point, when LLM is a choice of cost and speed.

Y Combinator

921,119 followers
2mo

Unify (YC W23) dynamically routes your prompts to the best LLM and provider, so you can easily balance cost, latency, and output quality. Just tune these three dials, and let Unify handle the rest. There is a new LLM emerging almost every week, and this means regularly testing each model against your requirements, juggling multiple accounts and API keys, and constantly updating your application to leverage the models most suited for your task. It can be pretty overwhelming. Many people give up and just try the largest models for everything. However, you really don't need GPT-4o or Opus to summarize simple documents, for example. Llama 8B is more than capable and is ~10 times faster and ~100 times cheaper. Most LLM apps are much slower and more expensive than they need to be. Founded by Daniel Lenton, Unify solves this problem by automatically routing each prompt to the best model based on your own preferences for quality, speed, and cost. Your "easy" prompts will go to the fastest and cheapest models, and only the "hard" prompts will go to the heavy lifters like GPT4. You focus on building your LLM application, and it'll focus on providing the best models, with the fastest providers, at the lowest cost. Congrats to the team on the launch!

Launch YC: Unify - The best LLM on every prompt ✨ | Y Combinator

ycombinator.com
Like Comment
To view or add a comment, sign in
Dave Yen

Building Orange Collective, YC alumni community fund backing next-gen YC founders
2mo Edited
Report this post
When we first met Daniel, he showed us the first version of Unify - a single API for developers to easily integrate multiple LLMs. It was clear this was a powerful tool for developers, but integrating across multiple LLMs wasn’t a hair on fire problem for companies - yet. At that time, OpenAI had just launched ChatGPT (only a few weeks after W23 started). Companies scrambled to get any AI wrapper to market as fast as they could, and gpt-3.5-turbo was simply the default choice. Plus, there were only a small handful of alternative LLMs to choose from. Today, those companies that started early have become far more sophisticated in their AI tech stack. And there are hundreds of LLMs now, with new models being launched weekly. While some tasks can be solved efficiently by smaller models, others require a larger model like opus, gpt-4o, gemini, etc. For example, you don't need gpt-4 to summarize simple documents if llama-3-8b can do it just as well and is 15x faster and 100x cheaper. But for extracting data from 100+ page PDFs, or for generating the best multilingual translations, cost probably isn’t as important. Choosing the right model for your specific task is harder and with higher stakes than ever. To get this right you need a rigorous and manual testing process that only few companies have the resources to do. But even for these companies, it’s a hard game to play. By the time you’ve defined your test cases, set up your evals, put together a benchmarking framework, and gathered enough data to make a decision, a new model is already launched. Many companies don't even bother trying, and just opt for the latest and largest model, passing on the costs to their users. Unify just launched a next-gen LLM router to help solve this. You focus on building your LLM app, and Unify automatically routes prompts to the right LLM to make sure you're using the best models, with the fastest providers, at the lowest cost. It also provides endpoints for private and self-hosted models, so you can deploy the router in your own infrastructure. Excited to be a part of the ride along with Orange Collective. Congrats to Unify and Daniel Lenton. Keep building 💪

Y Combinator

921,119 followers
2mo

Unify (YC W23) dynamically routes your prompts to the best LLM and provider, so you can easily balance cost, latency, and output quality. Just tune these three dials, and let Unify handle the rest. There is a new LLM emerging almost every week, and this means regularly testing each model against your requirements, juggling multiple accounts and API keys, and constantly updating your application to leverage the models most suited for your task. It can be pretty overwhelming. Many people give up and just try the largest models for everything. However, you really don't need GPT-4o or Opus to summarize simple documents, for example. Llama 8B is more than capable and is ~10 times faster and ~100 times cheaper. Most LLM apps are much slower and more expensive than they need to be. Founded by Daniel Lenton, Unify solves this problem by automatically routing each prompt to the best model based on your own preferences for quality, speed, and cost. Your "easy" prompts will go to the fastest and cheapest models, and only the "hard" prompts will go to the heavy lifters like GPT4. You focus on building your LLM application, and it'll focus on providing the best models, with the fastest providers, at the lowest cost. Congrats to the team on the launch!

Launch YC: Unify - The best LLM on every prompt ✨ | Y Combinator

ycombinator.com

1 Comment
Like Comment
To view or add a comment, sign in
LlamaIndex

187,668 followers
9mo
Report this post
Every AI engineer building LLM apps for prod should practice “Evaluation Driven Development” (EDD) 🧑🔬 ✅ Define eval metrics to target (faithfulness, relevancy etc. but also latency/cost) ✅ Define eval dataset ✅ Try out different approaches, compare against baseline The stochastic nature of LLMs means having a rigorous eval-first approach may be even more important than TDD in traditional software! Wenqi Glantz has a wonderful blog series on using EDD in LlamaIndex to 1) try out different retrieval approaches, and 2) to try out different LLMs and embedding models. Check them out! Retrieval methods: https://lnkd.in/ggCkmCk4 Comparing LLMs (e.g. zephyr-7b): https://lnkd.in/gzazguEA Want to learn more about EDD? We’re hosting a LIVE workshop this Thursday 9am PT. Don’t miss out 👉: https://lu.ma/4xhzyv9n
5 Comments
Like Comment
To view or add a comment, sign in
Jose Lara

I believe in simplicity, an iterative approach and data driven decisions.
9mo
Report this post
Evaluation driven development is it!
LlamaIndex

187,668 followers
9mo

Every AI engineer building LLM apps for prod should practice “Evaluation Driven Development” (EDD) 🧑🔬 ✅ Define eval metrics to target (faithfulness, relevancy etc. but also latency/cost) ✅ Define eval dataset ✅ Try out different approaches, compare against baseline The stochastic nature of LLMs means having a rigorous eval-first approach may be even more important than TDD in traditional software! Wenqi Glantz has a wonderful blog series on using EDD in LlamaIndex to 1) try out different retrieval approaches, and 2) to try out different LLMs and embedding models. Check them out! Retrieval methods: https://lnkd.in/ggCkmCk4 Comparing LLMs (e.g. zephyr-7b): https://lnkd.in/gzazguEA Want to learn more about EDD? We’re hosting a LIVE workshop this Thursday 9am PT. Don’t miss out 👉: https://lu.ma/4xhzyv9n
Like Comment
To view or add a comment, sign in
Noah Ratzan

Senior Designer @ Microsoft | Conversational AI, UX, Systems Design
2mo Edited
Report this post
One of the best ways to use LLMs is to get perspective on use of language. For example, people often don’t know what things mean in an article, video, or, in this case, a joke told via word and image. I didn’t get this joke image at first, and I had to read the comments. Sure enough, gpt4o had no problem. With the advent of tech like VoT, we don’t need APIs to integrate software anymore if we pipe sensory inputs to the LM so it can “see” what’s on-screen or “hear” the audio-out. https://lnkd.in/ekzNVHCH My OS, on a phone or desktop, can relay those inputs and the LM can simply relate the information in its own interface, without integration. Is this already a defined segment of software? It seems like it will be, if not already.
3 Comments
Like Comment
To view or add a comment, sign in
Warsame Bashir

Product @TrueML | Data & AI | Building Modern Data Platforms | Snowflake, Azure, Databricks, LLM, MLOps
8mo
Report this post
Today OpenAI released the Assistants API! 🤖 The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling 🚀

New models and developer products announced at DevDay

openai.com
Like Comment
To view or add a comment, sign in
Markus Zimmermann

Benchmarking LLMs to check how well they write quality code as CTO and Founder at Symflower. Only connect if you want to talk about using Symflower or one of my projects. No sales/leads, no HR search. Seriously!
1mo Edited
Report this post
DeepSeek-Coder-V2 might be taking the king’s crown 👑 from Llama 3 70B soon UPDATE: STRIKE THAT, it took the Crown! https://lnkd.in/dj89nitt but Anthropic's Claude 3.5 Sonnet is on its tail, and might be even better in quality. Still crunching the results. Still, worth reading details in the comments of this post, and the follow up post. Lots to learn! - 📈 Shares almost same score (19980) with Claude 3 Opus (19954) a leap over GPT4-o (19236) - 💵 However, FAR CHEAPER: $0.42 vs $90 vs $20 - 🔓 Open-weight and allows commercial use This is an excerpt of the preliminary results of the DevQualityEval v0.5.0 which is testing real world software development tasks and cases. We are evaluating +180 LLMs with different providers and tools. From OpenRouter over Hugging Face to Ollama. Let us know if we are missing a model you want to see evaluated! Deep dive blog post soon 🏇 Some more details in the comments 👇
17 Comments
Like Comment
To view or add a comment, sign in
Rachel Shalom

It's All about Data
10mo
Report this post
ok then GPTQ for inference now
Philipp Schmid

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️
10mo Edited

Quantization makes fine-tuning and deploying LLMs more efficient, but which strategy should you use? GPTQ, bitsandbytes (BNB) or both? 🤔 GPTQ and BNB are both now natively integrated into Transformers and provide significant memory savings. So when should you use each? 🧠 GPTQ and BNB have similar memory saving with up to 4x. ⚡ GPTQ provides similar performance as fp16 and is 2x than BNB when using custom kernels (exllama). 💡 BNB supports quantization on the fly. GPTQ quantization is done ahead of time and is slow. 🏋🏻♂️ BNB can merge Adapter back into the original model, making it easier to distribute or deploy after fine-tuning, which is not directly possible with GPTQ. TL;DR: Use GPTQ for inference and BNB when fine-tuning! ✅
Like Comment
To view or add a comment, sign in
Naqqash Abbassi

CTO @ mydost.ai
8mo Edited
Report this post
🚀 Exciting News from Anthropic: Introducing Claude 2.1! 🚀 🔍 What's New in Claude 2.1? ⬇ *Enhanced Reliability* 📉 Reduced False Statements: 2x fewer than Claude 2.0 📚 Better Accuracy: 30% reduction in incorrect answers 🕵️ Sharper Analysis: 3-4x improved in assessing document support *More Affordable* 💲 Lower Costs: 27% cheaper than Claude 2.0 📈 Exceptional Value: 1M tokens at $16 vs. $30 with GPT-4 Turbo *Advanced Function Calling* 🛠️ Integration Power: Seamlessly integrate with other products and APIs - personally I am really excited about this one 🌐 Web Search Enabled: Access and call custom functions or APIs 📚 Private Knowledge Access: Utilize private knowledge bases effectively *Extended Context Window* 📚 Massive Capacity: Handles about 150k words or 500 pages 🧠 Deep Analysis: Ideal for codebases, financial reports, or entire books 📈 Versatile Applications: Summarization, detailed Q&A, trend forecasting, and more

Introducing Claude 2.1

anthropic.com
Like Comment
To view or add a comment, sign in
Uzair Qarni

Building cool stuff with LLMs
1mo Edited
Report this post
2023 was the year of GPT Wrappers. 2024 is shaping up to be the year of AI Agents going mainstream. These are networks of LLM prompts/fine-tunes that talk to each other, plan next steps, retrieve relevant context, query external APIs, double check their work, and loop in humans when needed. With that comes a lot of new tooling for scaffolding these AIs, visualization, stepwise evaluation, drift prevention, and continuous improvement pipelines. Backend engineers were the stars last year but a lot of this new stuff is going to require the help of frontend and data engineers to help make build vs buy decisions on all the above internal tooling, setup the required data pipelines, and manage what is essentially maturing into a whole new capability stack. Exciting times.

3 Comments
Like Comment
To view or add a comment, sign in

921,119 followers

View Profile Follow

Y Combinator’s Post

Launch YC: Unify - The best LLM on every prompt ✨ | Y Combinator

ycombinator.com

More from this author

Women Engineers in Startups: Tess's Story

Explore topics