Thrilled to share that Meta's Llama 3.1 family of models, including 8B, 70B and 405B, runs seamlessly on AMD's AI GPUs, empowering pioneers like Fireworks AI to offer one of the fastest and most efficient inference engines from the start.
We are grateful for the opportunity to leverage our advanced memory capabilities, with up to 192 GB of HBM3. This allows a server equipped with eight AMD MI300X GPUs to accommodate the entire Llama 3.1 model, with its 405 billion parameters, in a single server using the FP16 datatype.
This remarkable memory capacity enables AI builders everywhere to minimize server usage, offering significant cost savings, simplifying infrastructure management, and enhancing performance efficiency.
Multifaceted polymath, serial entrepreneur, tech geek, problem solver, builder
4wYou guys are fast!