Meta Llama 3.1 405B, 70B & 8B are here - Multilingual & with 128K context & Tool-use + agents! Competitive/ beats GPT4o & Claude Sonnet 3.5 unequivocally the best open LLM out there! 🔥
Bonus: It comes with a more permissive license, which allows one to train other LLMs on its high-quality outputs 🐐
Some important facts:
> Multilingual - English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.
> MMLU - 405B (85.2), 70B (79.3) & 8B (66.7)
> Trained on 15 Trillion tokens + 25M synthetically generated outputs.
> Pre-training cut-off date of December 2023
> Same architecture as Llama 3 with GQA
> Used a massive 39.3 Million GPU hours (16K H100s for 405B)
> 128K context ⚡
> Excels at Code output tasks, too!
> Release Prompt Guard - BERT-based classifier to detect jailbreaks, malicious code, etc
> Llama Guard 8B w/ 128K context for securing prompts across a series of topics
How much GPU VRAM do you need to run these?
405B - 810 GB in fp/bf16, 405 GB in fp8/ int8, 203 GB in int4
70B - 140 GB in fp/bf16, 70 GB in fp8/ int8, 35 GB in int4
8B - 16 GB in fp/bf16, 8 GB in fp8/ int8 & 4 GB in int4
In addition, we provide a series of Quants ready to deploy: AWQ, Bitsandbytes, and GPTQ. These allow you to run 405B in as little as 4 x A100 (80GB) through TGI or VLLM. 🔥
Wait, it improves; we also provide unlimited access to HF Pro users via our deployed Inference Endpoint!
Want to learn more? We wrote a detailed blog post on it 🦙
Kudos to AI at Meta for believing in open source and science! It has been fun collaborating! 🤗