Alon Bochman’s Post

Have you ever spent a few hours (days, weeks) trying to optimize a prompt? You make an edit. Run a few examples. See if they’re good. Make another edit to fix the last example. Oh no, the first example broke. It’s a pain, right? What if you could get the LLM to do that for you? A recent paper from China uses LLMs as prompt optimizers. They call it Gradient Prompt Optimization (GPO). It works like this: 1. You write the task and initial prompt. 2. You ask an LLM to update your prompt using a *metaprompt*. You’ve seen metaprompts before. There’s one in the Anthropic Prompt Generator (link below). But GPO’s metaprompt is specific. It asks the LLM to fix performance on a few specific examples. And it can only change a few words at a time. Kind of like when you update model weights, you only make small updates at each forward pass. 3. Rinse and repeat GPO improves Llama-2’s performance on BBH from 30 to 35, on MMLU from 36 to 39 and on GSM8K from 22 to 28. GPO does better than other prompt optimizers such as CoT and APE. I would have loved to see a test on a newer model than Llama2. Paper: https://lnkd.in/dkyqYN8m Code: https://lnkd.in/dD6qbVUy Anthropic prompt generator: https://lnkd.in/dfnHTXEs #PromptEngineering #LLM #AI #ArtificialIntelligence #LanguageModels #AIResearch

8 Comments

Elliot Evertsson Norrevik

14 year old tech startup builder. Crazy enough to change the world.

I do this all the time, thought it was standard practice

1 Reaction

Bogdan Grigorescu

Sr Tech Lead | Engineering | Automation

Where is the feedback loop closing? It appears it closes on the human prompter. But it's not. The human prompter can't reliably verify the model's output (systemic issue with genAI models - traceback to source is unreliable and difficult).

2 Reactions

Debbie Reynolds

Alon Bochman thank you 🙏 for sharing.

See more comments

To view or add a comment, sign in

More Relevant Posts

Ing. Kachouri Mansour

AI Medical Engineer/ Clinical Engineer. Ionizing Radiations Physicist/ Innovative medical Devices/ all about Medical Imaging and Artificial Intelligence
3w Edited
Report this post
Generating text is by no means a trivial task! LLMs are optimized to predict the probability of the next token, but how do we generate text with that? The naive approach is to use the probability vector generated by the model, choose the word with the highest probability, and autoregress. This is the greedy approach, but this tends to generate repetitive sentences that degenerate when they are too long. Another approach is to use the probabilities generated by the model and perform a sampling of the words based on those probabilities. Typically, we use a temperature parameter to adjust the level of randomness of this process. This allows to generate less repetitive and more creative sentences. But those 2 techniques have a problem. When we generate a sentence, we want to maximize the probability of the whole output sequence and not just the next token: P(Output sequence | Prompt) Fortunately, we can express this probability as a product of the probabilities to predict the next token: P(token 1, .., token N | Prompt) = P(token 1| Prompt) x ... P(token N |Prompt, token 1, ..., token N - 1) But solving this problem exactly is an NP-hard problem. So, instead, we can approximate the problem by choosing k candidate tokens at each iteration, testing them, and keeping the k sequences that maximize the probability of the whole sequence. In the end, we just choose the sequence with the highest probability. This is called the Beam search generation and can be mixed with the greedy and the multinomial approach. Another approach is the contrastive search, where we take into account additional metrics like fluency or diversity. At each iteration, we choose candidate tokens, penalize the probabilities with a similarity metric of the tokens that were previously generated, and choose the tokens that maximize the new score. #engineering #AI #large_language_models
Like Comment
To view or add a comment, sign in
Eduardo Ordax

🤖 Generative AI Lead @ AWS ☁️ | Startup Advisor | Public Speaker
3mo
Report this post
🚀 Revamping RAG: The Power of 𝗔𝘂𝘁𝗼-𝗠𝗲𝗿𝗴𝗶𝗻𝗴 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹! Struggling to optimize your RAG system despite implementing advanced retrieval techniques like rerankers? You're not alone. Many face issues not from complex strategies, but from the basic step of chunking ✂ . Traditional chunking methods often split context in ways that degrade retrieval performance by disrupting the continuity of information. A game-changer? Auto-merging retrieval. This technique dynamically combines closely related chunks into a single, bigger "chunk", providing a more coherent view for the LLM and significantly enhancing retrieval outcomes. Check out the example below to see how it works in action! 🔹 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 𝗔: if a parent node has 4 child nodes and 3 of them are retrieved, the ratio of retrieved child nodes is 0.75 (3/4), exceeding the threshold of 0.4. Thus, the auto-merging function will activate, and the context submitted to the LLM will be the content of the parent node. Moreover, the auto-merging function is a repetitive process, indicating that auto-merging starts from the bottom-most nodes and continues merging up to the top-most nodes, ultimately obtaining all merged documents 🔹 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 𝗕: In a different setup with a threshold of 0.6, auto-merging will only activate for nodes where the retrieval percentage is higher than that. For instance, the node on the far right of the diagram meets this criterion, only this node will undergo auto-merging. This technique is native to LlamaIndex so check it out! #RAG #genai #ai
2 Comments
Like Comment
To view or add a comment, sign in
Sai Shreyas Bhavanasi

Founding ML Engineer @ Lancey (YC S22) | 3x 1st author ML publications | Artificial Intelligence, Software, Data Science, Machine Learning
7mo Edited
Report this post
Gemini’s release is full of asterisks What is CoT@32? What is 5-shot? I dug into it and here’s what you need to know They say “CoT@32”, but based on the report, it is “5-shot CoT@32 with uncertainty routing” Let’s break that down 5-shot: You prepend 5 examples (1 example is a question, 4 choices, an answer) to the LLM and then it guesses the correct option This is the standard and GPT4 beats Gemini ultra in this (86.4 vs. 83.7) CoT@32 (not to be confused with “CoT@32 with uncertainty routing”): you set the temp to a non-zero value. Similar to above prepend 5 examples. Prompt 32 times and get all the answers and then select the majority GPT4 beats Gemini ultra in this as well (87.29 vs. 84.99) CoT@32 with uncertainty routing: Now here you do the same as above (along with the 5 examples) and prompt 32 times to get answers. When Google talks about CoT@32, they mean this with uncertainty routing If the majority is above a threshold then you choose that answer. If not, then you do 1 more “greedy” query (simple 5-shot) and take that. Google created this new benchmark and GPT4 loses to Gemini in this (87.29 vs. 90.04) I’ll be diving deeper into LLM benchmarks later today giving a talk on Taro (YC S22): https://lnkd.in/gRGFBvZX #machinelearning #deeplearning #ai #ml
1 Comment
Like Comment
To view or add a comment, sign in
Oras Al-Kubaisi

AI Consultant | LLM | NLP | Generative AI | AI/ML | Python
4mo
Report this post
How to evaluate LLMs? With unit, behavioural, and integration tests, you define the input and expected output. This approach would only work for some use cases in ML/AI if you're expecting a predefined answer. Classifications are one of the cases in which this would be suitable. However, if you're testing fine-tuning, RAG, or even an extended context, you might need to apply different checks, such as text similarity (semantic) , containing a JSON or a code. One open-source library that solves this problem is PromptFoo, where you can define the input and expected output based on different criteria. Are there any similar libraries to evaluate LLMs? #llm #rag #ai
7 Comments
Like Comment
To view or add a comment, sign in
Husain Ghadiali

Business Analyst | Product | Generative AI
4mo
Report this post
In my daily exploration of GenAI and in an attempt to better understand the working of current models, I studied the RAG (Retrieval Augmented Generation) framework. Simply put, RAG helps to make LLM responses specific, to the context of the data you provide it with, instead of solely relying on the data it was trained on. This solves and introduces a plethora of opportunities and use cases. But much like anything else, it has its drawbacks. I attended the DeepLearning.AI conducted workshop yesterday, where Vikram Chatterji and Atin Sanyal spoke about RAG, its pitfalls and how 🔭 Galileo is helping fix these pitfalls. 4 key questions discussed that are potential pitfalls for RAG are: 1. Is the model’s output adhering to the given context? - Adherence 2. Is the model’s output fully and comprehensively answering the question? - Completeness 3. Are the retrieved chunks attributing to the models response? - Utilization 4. How much of the attributed chunks are being used to generate the model’s response? - Attribution Answers to these are crucial to know when using the RAG framework in LLMs and I found it fascinating to see how 🔭 Galileo enables us to evaluate LLM outputs, saving a lot of time in spotting output errors. Waiting to get hands on with RAG and 🔭 Galileo’s products. The image below shows how to potentially overcome the issues mentioned above, as discussed in the workshop. #GenAI #AI #RAG
3 Comments
Like Comment
To view or add a comment, sign in
Baking AI

464 followers
7mo Edited
Report this post
Graph Convolutional Networks (#GCN) are super cool in Recommendation Systems. They use networks to spot graph patterns, like connections between users and items. This makes them great for Recommendations because they understand these relationships well. See, Recommendation Systems are all about connections. You learn how users behave by looking at what they've done before with stuff. In other models, users and items are shown as embeddings, like code representing them. But the actual interactions between them aren't part of this #code. Instead, they're hidden in the math. Some clever folks in this field say, "Hey, why not put this interaction stuff in the code too?" That's where GCNs step in. They treat the stuff users interact with as clues about what they like. And the users who pick certain items become clues about those items. GCNs use this to make new codes for users and items, making them better by using info from things they've interacted with. They call this "embedding propagation." This makes GCNs in Recommendation Systems more evident! ____ #ai #machinelearning #recommendationsystems
Like Comment
To view or add a comment, sign in
Sukant Aryan

Python | Data Science | GenAI | ML Enthusiast | Exploring
3w
Report this post
I studied the paper on Kolmogorov-Arnold Networks (KANs) and they are quite unique and fascinating, especially their claim to outperform the classic Multi-Layered Perceptrons (MLPs). https://lnkd.in/gFXN8fTq Some key insights: - KANs are based on the Kolmogorov-Arnold representation theory, which simply gives us a way to write a higher dimensional function in terms of a bunch of 1-D functions. - KANs offer higher accuracy than MLPs and are even more interpretable, but the main problem is their slow training. The paper mentions "KANs are typically 10x slower than MLPs, given the same number of parameters". I think the main reason is that there are so many activation functions (B-splines) in the network, which makes batch computation slower than MLPs. - KANs are quite parameter efficient. For example, the paper points out that for PDE solving, a 2-layer width-10 KAN is 100 times more accurate and 100 times more parameter-efficient than a 4-layer width-100 MLP. The paper suggests that the slow training of KANs is more of a technical challenge than a fundamental limitation, suggesting potential for optimization in the future. Even if they are super efficient and accurate, their slow training really demotivates people to look into it more. I think it's very cool and could have an interesting future. Here's a small mathematical breakdown of the model #AI #deeplearning #machinelearning #researchpaper #KAN
2 Comments
Like Comment
To view or add a comment, sign in
Mark Kovarski

Responsible AI, Enterprise, Cloud, SaaS, Products.
1mo
Report this post
𝐕𝐢𝐝𝐞𝐨-𝐌𝐌𝐄: 𝐀 𝐜𝐨𝐦𝐩𝐫𝐞𝐡𝐞𝐧𝐬𝐢𝐯𝐞 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐛𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤 𝐟𝐨𝐫 𝐌𝐮𝐥𝐭𝐢-𝐦𝐨𝐝𝐚𝐥 𝐋𝐋𝐌𝐬 𝐢𝐧 𝐯𝐢𝐝𝐞𝐨 𝐚𝐧𝐚𝐥𝐲𝐬𝐢𝐬. Video-MME, the first-ever full-spectrum, Multi-Modal Evaluation benchmark of MLLMs in Video analysis. The source videos span a wide range of domains and durations, extending up to one hour length. The annotated QA pairs involve both perception and cognition level reasoning over these videos. 𝘎𝘦𝘮𝘪𝘯𝘪 1.5 𝘗𝘳𝘰 𝘷𝘴. 𝘎𝘗𝘛4𝘰 𝘓𝘦𝘢𝘥𝘦𝘳𝘣𝘰𝘢𝘳𝘥 Gemini 1.5 Pro leads in video comprehension on Video-MME with 75.7% accuracy, winning the battle against GPT-4V and GPT-4o by 15% and 9.5%, respectively, and excelling with multi-modal inputs. Leaderboard: https://lnkd.in/gNJwX7BU Home: https://lnkd.in/ghmAfrsy Paper: https://lnkd.in/g_NbKwa5 Github: https://lnkd.in/gFNh7Zti #AI #GenAI #LLM #MLLM #Video #Videoanalytics #Benchmark #MLLM #Multimodal #leaderboard
Like Comment
To view or add a comment, sign in
Cagla B.

Google Cloud BDR | Startups | AI & Cloud Strategist | Advocate for BaaS-W | Boardrooms as a SafePlace for Women
1mo
Report this post
I might be Priyanka Vergadia 's biggest fan. She's such a brilliant mind. I want you all to have access to the information she shares here in this video. Her #10DaysofGenAI posts will help you massively whether or not you're currently in the GEN AI space. I've personally been enjoying this series almost as much as her Google Cloud Song .. (which my network has seen me share more than once, it is addictive! :) ). #GenAI #GCP #Google #Gemini #GoogleCloud

Priyanka Vergadia

Cloud & AI • LinkedIn TopVoice • Author • Investor . Keynote Speaker • Board Member • Ivy League Faculty • Technical Storyteller • 85K+ Followers • Twitter @pvergadia • Website: thecloudgirl.dev
1mo

👋 Day 7 of #10DaysOfGenAI: Fine tuning vs RAG? When to use which one? ✅ What have we covered so far in this series: 🎥 Day 1: What is GenAI https://lnkd.in/dzuPKprx 🎥 Day 2: LLMs explained https://lnkd.in/dHcmkPDF 🎥 Day 3: Getting started with GenAI https://lnkd.in/dfJQqzmZ 🎥 Day 4: Prompt Engineering https://lnkd.in/dPCRRfxH 🎥 Day 5: RAG https://lnkd.in/dKbz5w_J 🎥 Day 6: Fine tuning https://lnkd.in/dZmA4i3b 🎥 Day 7: RAG vs Fine tuning https://lnkd.in/diUUPffA 👍 Follow me to not miss the daily videos! #genai #ai

3 Comments
Like Comment
To view or add a comment, sign in
Devashish Bhide

Treasurer @IEEE Student Branch GHRCEM | DSA | WordPress | Front End Web Development | C++ | Data Visualization | Data Analyst | Team Management
2mo
Report this post
A* and Dijkstra's algorithm are both used for pathfinding in graphs or networks, but they have some key differences: 1. **Objective**: - Dijkstra's algorithm finds the shortest path from a single source node to all other nodes in the graph, ensuring that the path is the shortest in terms of the sum of the edge weights. - A* algorithm finds the shortest path from a single source node to a single target node, using a heuristic to guide the search and potentially speed up the process. 2. **Heuristic**: - Dijkstra's algorithm does not use any heuristic; it always expands the node with the shortest path from the source. - A* algorithm uses a heuristic function that estimates the cost from the current node to the target node. This heuristic guides the search towards the target, potentially making A* more efficient than Dijkstra's algorithm for finding a path to a single target. 3. **Memory Requirement**: - Dijkstra's algorithm keeps track of the shortest distance from the source node to every other node, which can be more memory-intensive. - A* algorithm requires storing additional information for each node to calculate the heuristic function, which can also increase memory requirements but may be more efficient in certain cases. 4. **Completeness and Optimality**: - Both algorithms are complete, meaning they will find a path if one exists. - Dijkstra's algorithm is optimal, guaranteeing that it finds the shortest path from the source to all other nodes. However, it may not be optimal for finding a path to a single target node because it doesn't use a heuristic. - A* algorithm is also optimal if the heuristic is admissible (never overestimates the true cost), meaning it will find the shortest path from the source to the target node. In summary, Dijkstra's algorithm is simpler and guarantees the shortest path to all nodes from a source, while A* is more efficient for finding the shortest path to a single target node using a heuristic to guide the search. #AI #ALGORITHM

1 Comment
Like Comment
To view or add a comment, sign in

7,622 followers

View Profile Follow

Alon Bochman’s Post

More from this author

Does Anthropic’s Prompt Generator Work?

How Good is an LLM Judge?

Half of what you’ve read about the Kelly Criterion is wrong

Explore topics