Andy Owens’ Post

View profile for Andy Owens, graphic

Analytics || Data

We’ve been testing LLM to solve a data enrichment and classification problem. We're assuming more verbose text data will then make classifications more accurate. TLDR: There’s still no clear model/framework winner across LLMs from an evaluation framework of accuracy, latency, and cost on classification/prompting for this limited use case. Some summary results on a small dataset: claude-3.5-sonnet, cost 65c with good results as we saw gpt-4o, batched cost 10c, practically the same results but was very very slow gpt-4-turbo, practically the same results, batched cost 22c gpt-3.5-turbo, batched cost 2c and was faster and just as accurate as gpt4 amazon-titan-text-premier, cost 1c and results also just as acceptable as above meta-llama2-70b-chat,  cost 5c and made real mistakes compared to the others llama2-70b is not up to the task Takeaway so far is that seeding prompts with semantic search results seems to level the playing field so that less sophisticated models can make more informed classifications. Some models, like Titan and Llama, needed some more tuning than other out of the box models for our purposes. ++ John Butler for the great eval work here *Some model leaderboard info here from huggingface, they seem to change weekly:

Introduction

Introduction

huggingface.co

Kyle Whitmire

Passionately Curious | Girl Dad x2 | Better Every Day

3w

This is fascinating Andy Owens ... I don't know anyone else who has shared even a high level summary of this. This may be a dumb question, but what outcome are you hoping for in future state?

Like
Reply
Tanya S.

Creator of the "TGI-AI (c) : Tanya's Global AI Index"

2w

I am curios about real world use cases ( i.e. not the classification/clustering itself, but solve a real world problem like a world hunger, energy demands, climate issues and so on). What would the cost be to run those vs customer segmentation and the likes? Or what exactly r u segmenting for and why?

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics