Prasanna Krishnamoorthy’s Post

View profile for Prasanna Krishnamoorthy, graphic

Managing Partner- Upekkha (AI fund and accelerator) Download EY-Upekkha Report - upekkha.io/ey-upekkha-report

A common and obvious assumption about generative models (incl LLMs) is that they can only be as good as the data they're trained on. They can't "do better" than that. Can they? This interesting new paper "Transcendence: Generative Models Can Outperform The Experts That Train Them" shows that these models can sometime perform better than any of the experts who trained them! While this was done in the context of chess, there is no reason to believe that it might not be the case in other domains as well. Personally, I especially find this useful when I munge two frameworks from different areas to apply to a problem I have. Doing that myself would be challenging, but having the LLM transcend my abilities to do it gives me a great starting point! I like the term "Transcendence" as well - good marketing :)

Prasanna Krishnamoorthy

Managing Partner- Upekkha (AI fund and accelerator) Download EY-Upekkha Report - upekkha.io/ey-upekkha-report

1mo

Paper link: https://arxiv.org/pdf/2406.11741 Hat tip Ethan Mollick on twitter for reviewing this

Amandeep Singh Minhas

Customer-obsessed Global Delivery | 1x SaaSpreneur | PreSales Solutions | Implementation & Onboarding | Customer Success | AI Enthusiast

1mo

The fundamental fact about LLMs that most are unaware of is that LLMs are basically N-Gram models. Meaning, their job is to statistically predict the nth word as Output given n-1 words as Input. There is no real visibility into what exactly has gone into training the LLMs. No one knows what exactly the LLM knows. So, what most think of as Emergence is actually Recall. And, yes, the Data used to train and fine-tune is the Life Blood of an LLM. No LLM can perform better than what is possible based on the input Data.

Like
Reply

Only challenge in Chess analogy is that its easy to measure performance in Chess. Continue to believe that domain specific, driven by hand-written test cases, dataset to evaluate various models (or tools built on top) is a big open opportunity. MMLU and other typical benchmarks don't cut it. 3rd party, trusted evals and benchmarks will be the new G2.

See more comments

To view or add a comment, sign in

Explore topics