Crusoe’s Post

Crusoe reposted this

View profile for Ben Dickson, graphic

Software Engineer | Tech Blogger

I recently had the chance to speak to a team from Crusoe and Gradient on their effort to create an open LLM with a one-million-token context window. Some of the key takeaways: 1- Open research is vital to advances in AI: The team built on the research, techniques, models and code published by researchers across the world, including BAIR (distributed attention), Meta (Llama-3), Nvidia (RULER), as well as universities in China and Singapore. 2- You don't necessarily need the most expensive GPUs to conduct cutting-edge research: The team was able to train their models, Llama-3 8B and 70B, on a cluster of L40S GPUs at a fraction of the cost of higher-end GPUs. This was made possible as the Crusoe and Gradient team worked closely on adjusting the GPU cluster for the specific kind of computations that were required. 3- Long-context LLMs unlock new applications and opportunities that previously required extensive technical efforts through fine-tuning and RAG. Thanks to Leonid Pekelis, Ethan Petersen, and Patrick McGregor for sharing their experience. https://lnkd.in/grjR5Uxe

How Gradient created an open LLM with a million-token context window

How Gradient created an open LLM with a million-token context window

https://venturebeat.com

Roumen Popov

DSP Software Engineer

1mo

Can they add large numbers (like more than 6 digits each)?

Like
Reply

To view or add a comment, sign in

Explore topics