Artificial Analysis

Artificial Analysis

Technology, Information and Internet

The leading independent benchmark for LLMs - compare quality, speed and price to pick the best model for your use case

About us

The leading independent benchmark for LLMs - compare quality, speed and price to pick the best model for your use case.

Website
https://artificialanalysis.ai/
Industry
Technology, Information and Internet
Company size
11-50 employees
Type
Privately Held

Employees at Artificial Analysis

Updates

  • View organization page for Artificial Analysis, graphic

    2,517 followers

    Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

    View profile for Andrew Ng, graphic
    Andrew Ng Andrew Ng is an Influencer

    Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI

    Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

    Model & API Providers Analysis | Artificial Analysis

    Model & API Providers Analysis | Artificial Analysis

    artificialanalysis.ai

  • Artificial Analysis reposted this

    View organization page for Groq, graphic

    69,725 followers

    Google’s Gemma 2 9B is live on GroqChat and available via GroqCloud, running at 599 T/s! Try it now and follow along as Artificial Analysis builds out the model’s public benchmark (https://hubs.la/Q02GkqnH0). Gemma 2 9B accompanies other leading open-source models running on Groq from providers like Meta, Mistral, and OpenAI.

    Gemma 2 (9B): API Provider Performance Benchmarking & Price Analysis | Artificial Analysis

    Gemma 2 (9B): API Provider Performance Benchmarking & Price Analysis | Artificial Analysis

    artificialanalysis.ai

  • View organization page for Artificial Analysis, graphic

    2,517 followers

    SambaNova is now offering an API endpoint to access Llama 3 8B on its RDU chips, which we previously benchmarked at 1,084 output tokens/s SambaNova Systems is also differentiating itself from other offerings by allowing users to bring their own fine-tuned versions of the models. They appear to be offering API access on shared-tenant systems and as such, allowing users to bring their own fine-tuned models differentiates from other providers who typically require single-tenant dedicated deployments. This likely leverages memory advantages of their SN40L chip. Access is being offered on a upon request basis. This is a next step toward a open access commercial API offering that allows all AI developers to use its custom silicon RDU chip. We look forward to listing any commercial open access API offerings powered by SambaNova chips in the future on the main Artificial Analysis leaderboards!

    View organization page for SambaNova Systems, graphic

    45,592 followers

    Are you looking to unlock the power of lightning-fast inferencing speed at 1000+ tokens/sec on your own custom Llama3? Introducing SambaNova Fast API, available today with free token-based credits to make it easier to build AI apps like chatbots and more. Bring your own custom checkpoint for both Llama 8B and Llama 70B and avoid the cost of acquiring hundreds of chips to get started.  Relevance? The next phase of AI is Agentic AI; you’ll need lots of models, big and small, working together as one system. Development teams will require ultra-fast token generation, which we know cannot be achieved with GPUs. That is not all… you’ll need to host lots of models concurrently, with instantaneous switching between these models, which we know can’t be achieved with other architectures due to their inefficiency. You can’t get this speed, with a diversity of models, including your own custom model behind a simple API anywhere else!  SambaNova Fast API is available now: https://lnkd.in/g9W_Bnjv #FastAI #RDU #API 

  • View organization page for Artificial Analysis, graphic

    2,517 followers

    Smaller models are getting better and faster. We can see parameter efficiency is increasing with quality increasing from Mistral 7B to Llama 3 8B to the latest Gemma 2 9B with minimal size increases. The Llama 3 and Gemma 2 papers shared the impact of overtraining in achieving this quality. Gemma 2 9B was trained on 8T tokens and Llama 3 on 15T tokens (incl. 70B, the figure specifically for the 8B model was not released). While Google's Gemini 1.5 Flash's parameter count has not been announced, it is much faster speed compared to Gemini 1.5 Pro (165 output tokens/s vs. 61 tokens/s) indicating a much smaller model. It stands out as a clear leader in its quality & speed offering and is the fastest model we benchmark on Artificial Analysis when considering the median performance of providers. Smaller models are ideal for speed, cost and hardware capacity sensitive use-cases. For open source models they also enable local use on consumer GPUs. It's great to see the continued improvements

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    2,517 followers

    Today OpenAI banned access to its API from China. China has already blocked access to ChatGPT using the 'Great Firewall'. In the past few months, we have seen models from AI labs HQ'd in China start to 'catch-up' to the quality of models developed globally. These geo-restrictions will support demand for models developed from AI labs with an HQ in China. We may also see an acceleration in AI development to support this demand. 01.AI, Deepseek, Alibaba Cloud, SenseTime 商汤科技, Baidu, Inc. are key companies to watch in this space. For those using AI, this adds another consideration when choosing models. APIs may not be accessible from everywhere and we could potentially see further restrictions (e.g. use of the output of LLMs). We will look to provide information on this on Artificial Analysis to support users choosing technologies.

    View organization page for Artificial Analysis, graphic

    2,517 followers

    Models from AI labs headquartered in China 🇨🇳 are now competitive with the leading models globally 🌎 Qwen 2 72B from Alibaba Cloud has the highest MMLU score of open-source models, and Yi Large from 01.AI and Deepseek v2 from DeepseekAI are amongst the highest quality models and are priced very competitively. We have initiated coverage of these on Artificial Analysis. Previously models from AI labs with an HQ in China were generally not competitive globally with models from leading AI labs globally. They also had issues being multilingual, likely due to their Chinese focused training data set, and in-cases output Chinese characters in response to English prompts. This has changed over the past couple of months with new models released which benchmark amongst the leading models globally. These labs have achieved this using similar techniques to labs globally, particularly training the models on many times more tokens than is Chinchilla optimal, training larger models, using techniques like Mixture of Experts and improving training data quality (including through extensive use of synthetic & LLM-refined data). The labs are also increasing their marketing to global audiences, as shown by Yi Large being accessible on Fireworks AI. While Qwen 2 72B has the highest MMLU score of open-source models, it is important to note that Meta has announced they are to release Llama 3 405B shortly and this is likely to far exceed capabilities of all open source models available today. We have commenced benchmarking of these models on Artificial Analysis. Link to analysis: https://lnkd.in/g4bbqEre

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    2,517 followers

    Models from AI labs headquartered in China 🇨🇳 are now competitive with the leading models globally 🌎 Qwen 2 72B from Alibaba Cloud has the highest MMLU score of open-source models, and Yi Large from 01.AI and Deepseek v2 from DeepseekAI are amongst the highest quality models and are priced very competitively. We have initiated coverage of these on Artificial Analysis. Previously models from AI labs with an HQ in China were generally not competitive globally with models from leading AI labs globally. They also had issues being multilingual, likely due to their Chinese focused training data set, and in-cases output Chinese characters in response to English prompts. This has changed over the past couple of months with new models released which benchmark amongst the leading models globally. These labs have achieved this using similar techniques to labs globally, particularly training the models on many times more tokens than is Chinchilla optimal, training larger models, using techniques like Mixture of Experts and improving training data quality (including through extensive use of synthetic & LLM-refined data). The labs are also increasing their marketing to global audiences, as shown by Yi Large being accessible on Fireworks AI. While Qwen 2 72B has the highest MMLU score of open-source models, it is important to note that Meta has announced they are to release Llama 3 405B shortly and this is likely to far exceed capabilities of all open source models available today. We have commenced benchmarking of these models on Artificial Analysis. Link to analysis: https://lnkd.in/g4bbqEre

    • No alternative text description for this image
  • Artificial Analysis reposted this

    View profile for Andrew Ng, graphic
    Andrew Ng Andrew Ng is an Influencer

    Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI

    Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

    Model & API Providers Analysis | Artificial Analysis

    Model & API Providers Analysis | Artificial Analysis

    artificialanalysis.ai

  • View organization page for Artificial Analysis, graphic

    2,517 followers

    Fast to launch & very fast output speed! Groq has launched their Gemma 2 9B offering and is serving it at ~600 output tokens/s Gemma 2 9B is worthy alternative to Llama 3 8B and other smaller models. It is particularly attractive for generalist and communication-focused use-cases as shown by its Chatbot Arena (1185) & MMLU (71%) score exceeding Llama 3 8B (1153, 68%). For more specific use-cases it is worth conducting more narrow tests, e.g. for coding Gemma 2 9B well underperforms Llama 3 8B (40% vs. 62% on HumanEval). Groq is offering the model at $0.2 per 1M Input & Output tokens, in-line with Fireworks. Congratulations Groq on the fast-launch and impressive performance. We look forward to benchmarking other providers as they begin to host the Gemma 2 models, including potentially Google itself Analysis of Gemma 2 Instruct (9B): https://lnkd.in/gC6Xnj3a Analysis of providers: https://lnkd.in/gb9S5khK

    • No alternative text description for this image
  • View organization page for Artificial Analysis, graphic

    2,517 followers

    We are launching coverage of a new AI Lab, Reka AI! We break down what makes Reka AI & their models different below 👇 ‣ Model portfolio: Reka has 3 models: Reka Core, Flash & Edge. Each have different quality, price & speed positions. Reka Core exceeds Llama 3 70B’s quality and while it does not quite reach GPT-4o, it is offered at a lower price point. Flash and Edge are lower quality but are faster and cheaper. ‣ Multi-modal: Reka’s models are all multi-modal and allow for text, image, video inputs. This is quite a differentiator with few models offering multi-modal inputs. ‣ Large context windows: Reka’s models all have 128k context lengths. This makes them worth considering for RAG use-cases. This is a limitation of some of the more popular models including the Llama 3 models which only have an 8k context window. ‣ Independent: Reka is an independent AI lab and was founded by research scientists from leading labs including DeepMind, Google Brain & Meta’s FAIR. Reka is also here to stay! They just raised $60m last week 👏. Link to our benchmarks & analysis of Reka AI: https://lnkd.in/gvfYjuUJ

    • No alternative text description for this image

Similar pages