The best Large Language Models (LLMs) of 2024

PRICE
VERDICT
REASONS TO BUY
REASONS TO AVOID
VERDICT
REASONS TO BUY
REASONS TO AVOID
The best LLM logos on a Tech Radar background
(Image credit: Future)

Large language models (LLMs) are a type of artificial intelligence designed to understand and generate natural and programming languages. LLMs can be used to help with a variety of tasks and each have their own degree of suitability and cost efficiency. For this guide we tested multiple individual models from the same foundational model where appropriate to find the best LLM. 

This area of technology is moving particularly fast so while we endeavour to keep this guide as up to date as possible, you may want to check whether a newer model has been released and whether the cost efficiency for that model makes it a better choice.

These are the best LLMs of 2024

Why you can trust TechRadar We spend hours testing every product or service we review, so you can be sure you’re buying the best. Find out more about how we test.

These are the best LLMs of 2024 tested by us. We've picked one foundation LLM as best overall and selected individual models from a range of foundational models for each category.

Best LLM overall

OpenAI

(Image credit: Unsplash)

OpenAI's GPT

The best LLM overall

Specifications

Parameters: 175 Billion +
Access: API

Reasons to buy

+
Often first to release the newest most powerful models
+
High levels of investment
+
Response time

Reasons to avoid

-
Alignment team and some founders left OpenAI after latest release
-
Other models come close to the same ability at less cost

The majority of LLMs are based on a variation of the Transformer Architecture, a neural network architecture that was first documented in a 2017 research paper authored by 8 scientists working at Google. The Generative Pre-trained Transformer, also known as GPT, is one of several different foundational models used by tech firms to power the LLMs currently available on the market today. While there are several different types of GPT available today, the first, and arguably the most well-known, was introduced by OpenAI in 2018 as GPT-1.

GPT models can be adapted by developers to tackle specific tasks and workloads, or used in a more general approach to cover a more broad range of applications. For example, Github Copilot uses a version of OpenAI’s GPT-4 that is specifically tuned to help programmers write code, while the EinsteinGPT model built into Salesforce cloud aims to enhance the experience of their customers by improving productivity for employees. In November 2023, OpenAI announced they would enable ChatGPT subscribers to create custom GPTs using their own datasets, training data and even allow them to access database systems to pull data for analysis in real-time. OpenAI also plans to allow developers to publish and monetize their custom GPTs for other users and developers to use, so we might see some interesting releases over the next few years that build upon the core GPT models already available today.

OpenAI is at the forefront of GPT development, releasing several different versions for public use over the last few years. While each subsequent release of OpenAI GPT has contained incremental improvements to its intelligence and capabilities, this has come at the price of reduced performance, and an increase to response latency and cost to use. GPT-3.5 was very quick and cost effective, but could often make mistakes or demonstrate bias, GPT-4 improved the capabilities and intelligence of the model at an increase cost to use and higher response latency. The latest release, GPT-4o, bucks the trend by being the most intelligent version yet, while reducing the cost to use and improving latency by a considerable margin.

Out of the box, the GPT models from OpenAI provide a fantastic “jack of all trades” approach that is sufficient for most use cases today, while those looking for a more specialized or task specific approach can customize them to their needs. This makes GPT models a great option for those who need something that just works, without the need to train the models on their own datasets for them to become effective.

However, it’s important to note that, like with all LLMs on the market today, GPT models aren’t immune to providing false, biased, or misleading responses. While the most recent releases are becoming more accurate and are less likely to generate bad responses, users should be careful when using information provided in an output and take the time to verify that it is accurate.

Best LLM for coding

An image of Copilot's homepage

(Image credit: Copilot)

GitHub Copilot

Best LLM for coding

Specifications

Plans: Individual, Business, and Enterprise

Reasons to buy

+
Realtime code suggestions
+
Comments to code
+
Context-aware coding support and explanations

Reasons to avoid

-
Can be hit and miss with existing code bases

GitHub is one of the largest and most recognisable developer platforms in use today and is used by many individuals and enterprises to store, manage and share their codebases, so it makes sense that they have also created an LLM for coding to help developers enhance the speed and efficiency of their work. GitHub Copilot is a coding assistant powered by the GPT-4 model from OpenAI that can be accessed via an extension within several commonly used IDEs (Integrated Development Environments) Visual Studio Code, Visual Studio, Vim, Neovim, the JetBrains suite of IDEs, and Azure Data Studio. Additionally, unlike other coding assistants, GitHub Copilot has an advantage over the competition by being natively integrated into GitHub.

Originally released in October 2021 and powered by OpenAI Codex, a modified version of the GPT-3 model, GitHub Copilot is a coding assistant that provides developers with a range of different tools that help them to understand new and existing codebases or code snippets, write blocks code quickly and efficiently, and help troubleshoot issues. It can also help write test cases for automated testing and can help inspire you with solutions to problems you encounter. In November 2023, GitHub Copilot was updated to use the GPT-4 model to further improve its capabilities. With the recent release of OpenAI’s GPT-4o model, it makes sense to speculate that GitHub Copilot could be updated to use the latest version in the future but there has been no confirmation if or when that may happen at the moment.

One of the most eye-catching features is GitHub Copilot’s ability to use a prompt to generate code that can either be entirely new, or based on the project’s existing codebase by suggesting entire blocks of code or auto-completing lines as you type them. GitHub states that the model has been trained using source code from publicly available code repositories, including public repositories on GitHub itself, and claims that GitHub Copilot can support any language that appears in a public repository. However, GitHub does mention that the quality of the suggestions GitHub Copilot can offer is dependent on the diversity and volume of the training data available for that language. This could mean that while GitHub Copilot will still try to assist developers with suggestions when working in more obscure or less used programming languages, the benefits developers may gain when using GitHub Copilot may be reduced compared languages that are more common and publicly visible.

Subscriptions to GitHub Copilot are available today at 3 different feature levels and price points tailored to individual developers, small to large businesses, and Enterprises. If you’d like to try before you buy, GitHub Copilot offers a 30 day free trial to the “Individual” subscription tier.

Best value LLM

Meta Logo

(Image credit: Meta)

Meta LLama 3

Best value LLM

Specifications

Parameters : 8 Billion, 70 Billion
Access: Open

Reasons to buy

+
Close to the same ability as other models for a fraction of the cost

Reasons to avoid

-
A little slow to respond

Given Meta is included as one of the “Big Five” global tech firms, it should come as no surprise that they’ve been working on their own LLM to support their products, large and small businesses, and other applications such as research and academics. The original version of Llama released in February 2023, but was only made available on a case by case basis to select groups within academia, governmental departments, and for research purposes. Llama 2, released in July 2023, and Llama 3, released in April 2024, are both available for general and commercial usage today.

The most attractive selling point for Llama 3 is how cost efficient the LLM is compared to others on the market. While it might not be quite as capable as the behemoth that is GPT-4o (though still quite comparable), it is still a very capable model that can match the performance of GPT-4 at a fraction of the cost.

Depending on provider, Llama 3 costs an average of $0.90 per 1 million output tokens which is considerably cheaper compared to GPT-4 and GPT-4o, which sit at $30 and $15 respectively for the same quantity of tokens. This can make Llama 3 a very cost-effective solution for those who need to process a high volume of tokens and desire a high quality output, but have a limited budget. From its own internal testing, Meta has claimed that Llama 3 can also match Google Gemini and Claude 3 (though it’s currently unclear how it stacks up against the recent Claude 3.5 Sonnet release) in most benchmarks, making the value proposition from Meta increasingly attractive when combined with how much it costs to use.

In addition to the significant cost reduction achieved, with LLaMA 3 is also open-source, allowing users to sign up to gain access to each of the different sized models, download them, and install them on their local systems or infrastructure instead of relying on cloud-based offerings from providers such as AWS or Azure. This a considerable difference to many of the other popular models on the market which require you to exclusively utilize their services to make use of the models. The 8B (8 Billion parameter) version of Llama 3 is small enough that you could comfortably run it on a modern high end desktop, though you will need a large amount of RAM and GPU VRAM to make best use of it, though you do need considerably more for the larger 70B model. This essentially allows you to use the model for “free”, aside from any initial hardware setup costs, which could be very useful to individuals, students and academics. Additionally, those who may have concerns about privacy can avoid the potential risks commonly associated with sending data into the cloud for processing by hosting Llama 3 on local hardware, or owned infrastructure. Naturally the hardware you have available does factor into the overall performance of Llama 3 compared to using a cloud solution offered by providers such as Microsoft Azure or Amazon AWS, but if your main goal is to keep your costs as low as possible, the performance sacrifice may be worth it.

 

While the existing 8B and 70B Llama 3 models are highly capable, Meta is also working on a gigantic 400B version that Meta’s Chief AI scientist Yann LeCun believes will become one of the most capable LLMs in the world once released.

Best LLM for business

An image of Claude 3 artwork

(Image credit: Anthropic)

Claude 3

Best for business

Specifications

Parameters: Unknown
Access: API

Reasons to buy

+
High focus on alignment
+
Claimed monumental parameter size
+
Also great for coding

Reasons to avoid

-
More expensive than competitors
-
Slower than competitors

Released in March 2024, Claude 3 is the latest version of Anthropic’s Claude LLM that further builds on the Claude 2 model released in July 2023. Claude 3 has 3 separate versions, Haiku, Sonnet, and Opus, each with a different level of capability and cost to use. Claude 3 Opus is the highest level and most capable version of Claude 3 that Anthropic claims has set new industry benchmarks across a range of cognitive tasks and has a higher capacity for reasoning that other models on the market today.

One of the areas in which Claude 3 excels is the size of the context window which helps to improve the context of responses based on the conversation history. While the original release of Claude was limited to a 100,000 token context window, both Claude 2 and 3 have an expanded context window of up to 200,000 tokens. In real terms, this translates to roughly 500 pages of text, or approximately 150,000 words. For comparison, the standard context limit for GPT-4 is 32K tokens, and both GPT-4o and Google’s Gemini 1.5 Pro are limited to 128K tokens. There are several business cases where this large input limit can provide significant gains, such as identifying trends within a large dataset, summarization of long form answers from customer satisfaction surveys, screening job applications based on a given criteria, and helping to iterate upon an idea or design being discussed with Claude 3.

As well as its ability to process large datasets, Anthropic claims that Claude 3 Opus, the most expensive tier of Claude 3, is the most intelligent model on the market today, and can has demonstrated some level of awareness based on the tasks given to it. During testing, Alex Albert, one of Anthropic’s prompt engineers, gave Claude 3 Opus a task similar to finding a needle in a haystack by asking it to locate a specific sentence hidden inside a random collection of documents. Not only was the model able to find the “needle”, but Claude 3 Opus also mentioned that the sentence appeared out of place and suggested that it was likely placed there for testing purposes. This demonstrated a surprising level of awareness not usually found within LLMs, although it remains to be seen as to whether this was something akin to true awareness, or if it simply follows the pattern of mimicking human intelligence as most LLMs attempt to do.

The creators of Claude, Anthropic, have a very strong foundation on alignment, aiming to Claude a better choice for businesses that are concerned not just about outputs that might damage their brand or company, but also society as a whole.

However, all of this does come at a rather large cost compared to the competition. Claude 3 Opus currently costs $75 per 1 million output tokens via their API, which is a hefty price when compared to the $30 of GPT-4 Turbo, or the insanely low $0.9 of Llama 3. Similarly, the Haiku and Sonnet versions of Claude 3 are also cheaper but offer faster response times at the cost of reduced intelligence.

For those not looking for API access, Anthropic provides a free subscription tier that includes limited access to a chat interface at claude.ai powered by the newly released Claude 3.5 Sonnet, however all 3 models are accessible with higher usage limits by subscribing to the Pro tier.

Best LLM for chatbots

Qwen Logo

(Image credit: Qwen)

Qwen

Best LLM for chatbots

Specifications

Parameters: 72 Billion
Access: Open

Reasons to buy

+
Trained on multiple languages
+
Cheap to run

Reasons to avoid

-
Conversation abilities not as strong as other LLMs 

Released in February 2024, Qwen-1.5 is an LLM from Alibaba tailored that aims to match or outperform Google’s Gemini and Meta’s Llama models in both cost and capability. As well as a base models, Alibaba have also released a counterpart model tailored for chat scenarios as Qwen-1.5-chat

Similar to Llama, Qwen-1.5 is an open-source model that anyone can download for free and install on their own hardware and infrastructure. This makes Qwen-1.5 a very competitive choice for developers, especially those who have limited budgets, as the main costs with getting this model up and running are initial hardware investment and the cost to run and maintain the hardware. To help support developers, Qwen-1.5 offers several different sizes of the model to fit a wide range of devices and hardware configurations. The largest and most capable version of Qwen-1.5 chat currently sits at 72B parameters, while the lightest version is as small as 0.5B. Qwen-1.5 has an input token limit of 32K (the 14B model is limited to 8K), which is a on par with GPT-4 and is significantly larger than the 4096 input token limit of Llama 2. Although it has the same input limit as GPT-4, Qwen-1.5 has the same output token limit as Google’s Gemini at 8192, which is one of the higher output limits for LLMs on the market today. It should be noted that, like with other models on the market, the capabilities of the model decrease as the parameter size reduces, so keep that in mind when selecting the model size for your specific use case.

In benchmarks, Qwen-1.5 consistently outperforms Llama 2 in most scenarios, whilst also achieving competitive results compared to GPT-4. This further increases the attractiveness of Qwen-1.5 as it can offer near GPT-4 levels of capability at a fraction of the cost, and you can fine tune with a custom dataset to tailor the model to your specific usage needs. Plus, as you train the LLM on your own machine, you get to keep hold of your own data.

 In a customer support scenario, this would provide you with a bot that is far more capable of understanding the issue a customer might have than the more traditional keyword or rule based chatbots commonly seen on the internet today. Qwen-1.5 would then be able to respond intelligently to customer queries based on your knowledgebase to improve first contact resolution rates and escalate more difficult or advanced issues to second line support agents. To further enhance its chat capabilities, Qwen-1.5 can accept and respond in an impressive 35 languages and can offer translation services in over 150 others. Like with other LLMs, the number of tokens for inputs and outputs depend on the language being used as some have a higher token-to-character ratio.

While recommending Qwen-1.5 for chatbots might seem like a bit of a curveball, its important to remember the use case you are applying this LLM to. In the case of a customer support bot, you probably don’t need advanced intelligence allowing users to have the long philosophical conversations that you might with something like GPT-4o as its way out of scope for what you intend to use it for.

Qwen-1.5-7B-chat is available for use today via a web interface over at huggingface.co, while the larger models can be downloaded to run locally.

Best multimodal LLM

OpenAI logo with GPT-4o under it

(Image credit: OpenAI)

GPT-4o

Best multimodal LLM

Specifications

Modes: Audio, vision, and text
Latency: Real time

Reasons to buy

+
Half the price of GPT-4 Turbo
+
Multimodal capabilities open up a variety of use cases

Reasons to avoid

-
Alignment team left after release
-
More costly that other models

OpenAI is one of the most recognisable names when it comes to LLMs and is widely known for several models and products released over the last few years, including DALL-E for image generation, and ChatGPT, a chatbot based on GPT-3.5 and GPT-4. 

Released in May 2024, GPT-4o is the latest offering from OpenAI that extends the multimodal capabilities of GPT-4 Turbo by adding full integration for text, image and audio prompts, while further reducing the cost to users, making it an attractive option for those looking to for a language model that can fill multiple roles. OpenAI claims GPT-4o is twice as fast, half the cost, and has five times the rate limit compared to GPT-4 Turbo. 

One of most significant enhancements in GPT-4o is Voice Mode, which allows the model to process audio in real time, and output a realistic, tone appropriate response in human sounding voice that might make you question if you are speaking to a real person. The human voice output is certainly impressive when compared to most of the text to speech applications currently on the market and does a fantastic job of imitating how a person might speak in real life by adding inflections and nuances normally heard in regular conversation. Additionally, GPT-4o can utilise a camera to analyze the environment around you to help add context to the responses given. OpenAI demonstrated the Audio Mode and Vision features in a video alongside the release announcement for GPT-4o, however these features are not yet fully available for general usage. 

Full text integration in GPT-4o adds incremental improvements to evaluation and reasoning compared to GPT-4 and GPT-4 Turbo and offers live translation into 50 different languages. Like with Audio Mode, GPT-4o further improves the ability to recognise context and sentiment from text inputs, and provide accurate summarizations, allowing responses to be more accurate and be presented in the appropriate tone. As with previous versions of GPT, GPT-4o can store and refer to historic conversations and look them up in real time to lend further context to responses.

OpenAI is already rolling out the text and image features of GPT-4o to ChatGPT. In a first for OpenAI, those using the free tier of ChatGPT will have access to GPT-4o, although in a limited capacity that resets daily, which is a fantastic step given GPT-4 required a paid subscription to use when it launched. Plus subscribers users will have access to message limits up to 5 times higher than before, and an alpha version of Voice Mode will be made available to Plus users in the coming weeks. API access to the new text and image capabilities of GPT-4o is available for developers to use today, while the new audio and video capabilities will be made available in the API to a select group of partners ahead of a full rollout to the wider audience, however there’s no current announcement for a specific date as to when the new voice and video capabilities will be made available.

 

Best LLM for translation

Google Gemini logo

(Image credit: Google)

Google Gemini

Best LLM for translation

Specifications

Parameters: Unknown
Access: API

Reasons to buy

+
Extensive data from Google translate
+
Cheaper than other models

Reasons to avoid

-
Not strictly doing the translation on its own

Released in February 2024, Gemini 1.5 is an updated version of the original Gemini LLM released in December 2023 that offers improved capabilities and performance compared to the original. As of May 2024, there are 2 versions of Gemini 1.5 available to subscribers – Pro and Flash.

While Gemini doesn’t appear to directly translate text itself, the translations provided are a combination of the translations provided by Google Translate, the multilingual training data Gemini has access to, and the LLM capabilities of Gemini, to produce a more fluent and natural sounding output. This results in translations that flow better, make more contextual sense, and are less awkward than the more literal translations typically offered by Google Translate on its own, resulting in a much better overall translation. This combined approach means that Gemini 1.5 can be used to translate any language currently available via Google Translate, however the level to which Gemini 1.5 is able to enhance the output to improve how fluent and natural sounding it is does depend on the level of multilingual training data available to the model for each individual language – though this is a similar limitation that other LLMs also have.

While other LLMs, such as and GPT-4o, also provide some translation capabilities, one of the key areas that Gemini 1.5 has an advantage is cost. Costs can quickly mount up if large quantities of text need to be translated, so being able to translate quickly and cheaply is an incredibly important factor. While GPT-4o has demonstrated some impressive translation capabilities of its own, also costs $15 per 1 million output tokens for text output. By comparison, Gemini 1.5 costs only $2 per 1 million, which is significantly cheaper. An important thing to note when translating large quantities of text is that while Gemini 1.5 can accept up to 1 million input tokens at a time, output is currently limited to only 8192 tokens. The number of tokens required for an output will heavily depend on the target language for the translation, with some languages having higher token-to-character ratios than others. Exceeding this limit can result in error messages, or truncation that leaves your translation incomplete. To obtain translations that require an output larger than the token limit, you’ll need to break down your requests into smaller chunks. While 8192 tokens per response might seem quite low when considering it equates to around 6000 words, GPT-4o is currently limited to 2048 output tokens per response.

Gemini 1.5 Pro is free to use with some limitations, though a subscription is required for access to the increased 1m input token limit and higher rate limits.
 

How to choose

Essentially it comes down to bang for buck. ChatGPT-4o is brilliant and can do pretty much what all the others can but at a cost. Claude 3, while not trained specifically for coding like Copilot also has a good reputation for creating code. Another thing to consider is access to your data and who owns what. You can train your own chatbot with OpenAI by creating an assistant but at the end of the day that stays with OpenAI. If you use an open model you can keep hold of your data and completely own your own trained model.

FAQs

What are Token limits?

Token limits are a restriction LLMs have based on the number of tokens they are able to process in a single interaction. Without limits, or by having limits too large, the performance of an LLM can be affected, resulting in slow response times. However, if the limit is set too low then the LLM may struggle to generate the desired output. If an output limit is exceeded, the LLM may truncate an output, leaving it incomplete, attempt to reduce the size of the output by providing a less detailed output, or could simply generate an error. Some LLMs have the ability to segment responses to overcome output limits, but this isn’t a universal feature for all LLMs. 

What is a rate limit?

A rate limit is the number of requests that a user can make over a given period of time, often in minutes, hours or days. Rate limits are usually imposed by providers to help reduce the load on infrastructure so they can continue to provide an optimal service level. Rate limits are usually defined within the subscription tier for each product, with more expensive tiers offering increased rate limits.

Rate limits for your chosen LLM will vary depending on the provider, so consult their pricing sheets to determine which tier offers the best value for your needs.

How to use a local LLM / Open source LLM

Unlike providers such as OpenAI and Google which use subscriptions to provide access to their LLMs, and are hosted on their own infrastructure, individuals and enterprises can download open source LLMs and deploy them to their own infrastructure. 

Users have the option to deploy the model to existing hardware they already own, or purchase cloud compute or VPS instances to provide increased performance if greater capability is required. 

For enterprises, this can present a cost effective way of incorporating an LLM into their business, while keeping costs lower and reducing privacy and data security concerns by keeping information in house rather than submitting it to a third party.

What are tokens?

LLMs don't break up language into individual words. Instead, they break language into chunks of text. These chunks can be individual characters or phrases of multiple words . These chunks of text are called tokens. 

When you use an LLM through an API you are paying for the amount of tokens used. For multimodal LLMs it's slightly different. The amount of effort to take an image as an input is converted to an amount of tokens. So, while it is not measuring tokens, you are still charged tokens.

So, tokens can be seen as a unit of currency exchanged for work done by an LLM and also a unit representing an amount of text.  The general rule to estimate how many tokens a prompt will use is 1 token = 4 characters. For example, “May the force be with you” contains 25 characters and would require ~6 tokens. However, this is only an estimate, and using inputs in other languages could require more tokens for the same input.

You’ll need to consult the documentation of your LLM of choice to learn more about how their specific Tokenizer works.

What is an LLM?

A Large Language Model (LLM) is form of artificial intelligence trained using massive sets of data to allow the model to recognize and generate text across a wide range of tasks. LLMs are build upon machine learning concepts using a type of neural network known as a Transformer Model.

More advanced LLMs are also capable of accepting and outputting images, videos and audio as inputs for the model to recognize. These models are known as a Multimodal Large Language Model (MLLM)

Grant Hickey

Fascinated by computers from a young age, Grant is on an endless quest to leverage existing and emerging technologies to augment and enhance the productivity of individuals and enterprises, and to improve the velocity at which teams can analyze data and identify trends within their customer base or organization. Grant has previously worked as a software engineer building cloud based CRMs, before moving into the games industry to work for Krafton on PUBG:Battlegrounds and later Creative Assembly. Always looking to improve his working practices he often builds his own tools to streamline tasks and become more efficient.

With contributions from