Daniel Kornev’s Post

Daniel Kornev

CEO at Sentius | Techstars'24 | Microsoft Alumni'10 | Xoogler'11 | 2nd-time Founder

1mo

Awesome to see BABILong helpful in deciding what model to use.

Kirill Shcherbakov

ML Engineer @ JetBrains AI

1mo Edited

💡 I remember how helpful the evals of GPT-4-Turbo and GPT-4o on BABILong were in deciding whether to replace GPT-4-Turbo with a new flagship GPT-4o when it was released, offering SOTA on almost all eval benchmarks and x2 less costs - https://lnkd.in/e4JTxrzs The answer wasn't that straightforward since it was shown that the performance varies based on your use case - simple Q&A with short / mid-size / long contexts. And the brand new GPT-4o doesn't always win. 🔥 Today, they released LLMs evals on long context (up to 10M+) reasoning dataset BABILong - https://lnkd.in/eqBQcqxg 🗝 The findings are quite interesting: - Popular open-source LLMs, as well as GPT-4 and RAG, show that their performance heavily relies on the first 5-25 % of the input => highlights the need for improved context processing mechanisms - Fine-tuning boosts performance for GPT-3.5-Turbo and Mistral-7B, but context lengths remain limited (16K and 32K, respectively) - Mamba (130M) and RMT (137M) achieved the strongest results: RMT can process lengths up to 11 million tokens, while Mamba struggles beyond 128K tokens => shows that these challenges are indeed solvable Credits for pics to Mikhail Burtsev

1 Comment

Tanya Sushchenko

Senior Product Manager

1mo

I don't know the official score, but Claude 3.5 has me )

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Barry Zhang

Research Engineer
11mo Edited
Report this post
One-line distillation from GPT-4 to GPT-3.5 Wrote a smol library for one-line model distillation over the weekend. Still experimenting with task granularity and hyperparameters to make it useful but sharing the code for whoever might want to run similar experiments. Give it a list of prompts, it will: - generate responses from GPT-4 - upload the data to openAI - fine-tune a GPT-3.5 model Only made these following parameters configurable: - GPT-4 system prompt, temperature, max_tokens - Fine-tuning n_epoches and n_data_repetition Adding data augmentation and cost estimation in the coming days. https://lnkd.in/ep9mPGaQ
1 Comment
Like Comment
To view or add a comment, sign in
Mohamed Elgendy

CEO & Co-founder of Kolena | AI Quality Platform
9mo
Report this post
Part 3: Quantifying GPT-4’s Hidden Regressions Over Time Users have shared online that they experienced a recent regression in GPT-4 model performance in a variety of contexts. We tested the latest version of GPT-4 on Kolena and compared it with earlier GPT versions to know for sure. Read this quick write up by Mark Chen to find out. We will be posting detailed results soon. Stay tuned! https://lnkd.in/eZEFyTEU

Quantifying GPT-4’s Hidden Regressions Over Time

kolena.io

2 Comments
Like Comment
To view or add a comment, sign in
Aleksa Gordić Aleksa Gordić is an Influencer

Shipping | x-Google DeepMind / Microsoft
8mo
Report this post
A 13B LLama model just beat GPT-4? 😲😲😲 "Catch me if you can! How to beat GPT-4 with a 13B model": https://lnkd.in/ddnWAtfZ Ok, that's kind of a click-bait - here is what actually happened. :) The lmsys folks showed how easy it is to reformulate the test set in such a way that traditional decontamination methods fail to flag that as an overlap with the original test data. Then once you train on that, which is "officially not a test set", you get amazing performance on the test set. :) Very important. Most benchmarks are broken. We don't have a good solution for this and the best way is to try the model for yourself and see whether it works for your use-case. Not really scientific, but hey. We'll get there.
14 Comments
Like Comment
To view or add a comment, sign in
Richard Shoemake

AI Architect / Engineer / AI Author / Patented Inventor
8mo
Report this post
The open source approach is getting close to the state-of-the-art. Want to see how GPT-4V compares to the top open-source multi-modal model LLaVa 1.5? #ai #ml #machinelearning #artificialintelligence #llm #llms #nlp
LlamaIndex

189,045 followers
8mo

Want to see how GPT-4V compares to the top open-source multi-modal model LLaVa 1.5? (Haotian Liu) Wenqi Glantz has a fantastic blog post comparing the two models on both specs and qualitative examples. Here’s some insights 👇 💡GPT-4V is better in text recognition 💡LLaVa is better at generating cool ASCII art (though inconsistent) 💡LLaVa is surprisingly on par with GPT-4V in many cases given 1) open-source, and 2) smaller training corpus 💡That said, hosting on Replicate is still pricey. Matches GPT-4V in per-token costs Check out the full blog here: https://lnkd.in/gfVq76Kd
Like Comment
To view or add a comment, sign in
Devaansh Arora

Artificial intelligence Engineer I
2w
Report this post
Financial reports, with their myriad of charts and tables, often pose a significant challenge to traditional text-based parsers. LlamaParse, by leveraging the power of multimodal models like GPT-4o, transforms these complexities into well-structured, easily interpretable data. While this application is impressive, it also got me thinking about the broader implications of Generative AI (GenAI) in various sectors, particularly in Governance, Risk, and Compliance (GRC). Here are some out-of-the-box applications of GenAI:Governance, Risk, and Compliance (GRC): 1. GenAI can revolutionize GRC by automating risk assessments, monitoring compliance with regulatory changes, detecting fraud through anomaly detection, and managing policy updates efficiently. This can lead to more resilient and compliant organizations. 2. Healthcare: GenAI can automate the analysis of medical records and research papers, assisting in diagnostics and treatment planning, thus enhancing the accuracy and speed of healthcare delivery. 3. Legal: In the legal sector, GenAI can streamline the review of legal documents, contracts, and case law, improving the efficiency and accuracy of legal research and due diligence processes. 4. Education: By analyzing educational materials and student performance data, GenAI can enhance personalized learning experiences, tailoring content to individual needs and improving educational outcomes. 5. Finance: Beyond financial reports, GenAI can optimize trading strategies, perform real-time risk assessments, and enhance customer service through intelligent chatbots and personalized financial advice. The possibilities are endless, and it’s exciting to witness how GenAI is paving the way for more advanced and efficient data processing solutions. #FinancialReporting #GPT4 #AI #GenAI #DataScience #GRC #Innovation
LlamaIndex

189,045 followers
2w

Using GPT-4o for parsing and retrieval for financial report RAG 📑📈 Financial reports can contain a lot of charts and tables that text-based parsers struggle with. LlamaParse is the easiest way to use multimodal models like gpt-4o to extract out text, diagram, and table information into well-structured representations. This leads to not only good answers, but also easily-interpretable sources and citations Check out Hanane D. post here: https://lnkd.in/gV8eVr8h
1 Comment
Like Comment
To view or add a comment, sign in
André Koriath

Director | Head of Data & AI DACH
7mo
Report this post
"The next stage of LLMs in production is all about making responses hyper-specific: - to a dataset - to a user - to a use-case - even to a specific invocation This is typically achieved using one of 3 basic techniques: 1. Context-window-stuffing 2. RAG (Retrieval Augmented Generation) 3. Fine-tuning To help choose sane defaults the article conducts a “needle in a haystack” pressure-test of RAG against GPT-4-Turbo’s context-window, across 3 key metrics: (1) accuracy (2) cost (3) latency TL;DR: (RAG + GPT-4) delivers superior performance, at 4% of the cost."

RAG vs. Context-Window in GPT-4: accuracy, cost, & latency

ai88.substack.com

1 Comment
Like Comment
To view or add a comment, sign in
Pascal Biese

My feed is all you need for AI 📲🤗 Daily highlights for 50k+ experts
11mo
Report this post
💥 Open Source catching up with GPT-3.5 already 😲 Finally a paper that puts its results into context and doesn't shy away from the comparison with closed-source models. This is probably due to the fact that both LLaMA-2 and their new model, Platypus2, are topping GPT-3.5 on the popular Hugging Face LLM leaderboard! Now does that mean that GPT-3.5 is worse than the 70B models we can host ourselves? Not really. Four tasks can't capture the full bandwith of capabilities that modern LLMs are coming with. But as it looks, LLaMA-2 and Platybus2 are more honest in their replies while GPT-3.5 still scores higher in abstract reasoning. Nevertheless, it's really exciting that we - as a community - are about to surpass a gigantic model that has seemed far, far away only a few months ago. Up and onwards! [arXiv] https://lnkd.in/d3v2dePv [GitHub] https://lnkd.in/d2UK5jfD #LLMs #GenAI #OpenSource
1 Comment
Like Comment
To view or add a comment, sign in
Josh B.

Navigating the intersection of AI and Engineering at Lilly
5mo
Report this post
Is it possible to beat GPT4 with smaller parameter models? Amazingly data is showing that fine tuned small open source LLMs are outperforming in tasks such as NER, Question Answering and Sentiment Detection to name a few. PEFT and QLoRA are more powerful than some realize as they tend to gravitate to GPT4. Take a look! https://lnkd.in/gRTU_PuE

LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4 - Predibase

predibase.com
Like Comment
To view or add a comment, sign in
Samir Damle

Principal Designer, Platform DX at Salesforce | Wizo | AI engineer & trainer | Certified from University of Washington and Deeplearning.ai
2w
Report this post
Extracting key info with LlamaParse
LlamaIndex

189,045 followers
2w

Using GPT-4o for parsing and retrieval for financial report RAG 📑📈 Financial reports can contain a lot of charts and tables that text-based parsers struggle with. LlamaParse is the easiest way to use multimodal models like gpt-4o to extract out text, diagram, and table information into well-structured representations. This leads to not only good answers, but also easily-interpretable sources and citations Check out Hanane D. post here: https://lnkd.in/gV8eVr8h
Like Comment
To view or add a comment, sign in
LlamaIndex

189,045 followers
2w
Report this post
Using GPT-4o for parsing and retrieval for financial report RAG 📑📈 Financial reports can contain a lot of charts and tables that text-based parsers struggle with. LlamaParse is the easiest way to use multimodal models like gpt-4o to extract out text, diagram, and table information into well-structured representations. This leads to not only good answers, but also easily-interpretable sources and citations Check out Hanane D. post here: https://lnkd.in/gV8eVr8h
25 Comments
Like Comment
To view or add a comment, sign in

4,323 followers

View Profile Follow

Daniel Kornev’s Post

More from this author

Building Blocks for AI Assistants - Part III: Autonomous Agents: Origins (1986-2022)

DeepPavlov Library 0.11.0 release

From Palantir to New Type of Business

Explore topics