Llama 3 8B vs Mistral 7B: Small LLM Pricing Considerations

In the race to develop the best generative AI model, models with billions of parameters, like GPT-4 and Claude 3, are the most powerful. However, sometimes you don't need the full capabilities of such large models, which also carry a higher price tag. Small language models are more affordable options that may work better for your use case. Among these models, Llama 3 8B was recently introduced and outperforms Mistral 7B, which was previously widely chosen as the go-to small model, on popular leaderboards.

While leaderboard rankings are a useful metric, they don’t tell the full story and it's essential to consider other factors such as training data, availability, and pricing. Both Llama 3 8B and Mistral 7B are available locally, as well as through multiple platforms, including Amazon Bedrock as managed services, which we will focus on in this comparison.

Llama 3 8B

Llama 3 8B is Meta’s 8-billion parameter language model that was released in April 2024. It is an improvement to the previous generation, Llama 2, with the data training set being seven times as large, with a more significant emphasis on code. The model is well-suited for various use cases, such as text summarization and classification, sentiment analysis, and language translation.

Mistral 7B

Mistral 7B is a dense transformer model that strikes a balance between performance and cost efficiency. Released in September 2023, Mistral 7B has been a popular choice for those seeking a smaller, more affordable language model. Use cases include text summarization and structuration, question-answering, and code completion.

Model Comparison: Llama 3 8B vs Mistral 7B

Llama 3 8B outranks Mistral 7B in popular leaderboards, but there are other factors to consider, such as:

Tokens: Through Bedrock, both models have max tokens of 8k.
Parameters: Llama 3 8B has an additional billion parameters compared to Mistral 7B. This may result in better performance and responses, but it also comes with drawbacks, such as slower inference speed and higher computational requirements. However, Meta benchmarks comparing inference speed to Llama 2 7B show the same inference speed due to its new tokenizer.
Training Data: Llama 3 8B was trained on over 15 trillion tokens of public data up to March 2023. There is no information about Mistral 7B's training data, which, according to Mistral's CEO, is because of "the highly competitive nature of the field". Users have speculated the knowledge cutoff is around February 2023.
Regions: Through Bedrock, Llama 3 8B is available in the Asia Pacific (Mumbai) region. Mistral 7B is available in many more regions: Asia Pacific (Sydney), Europe (Paris), Europe (Ireland), and Asia Pacific (Mumbai).
Languages: Llama 3 8B was trained in over 30 languages but is most performant in English. Mistral 7B is only performant in English.

Pricing Comparison: Llama 3 8B vs Mistral 7B

Pricing through Amazon Bedrock is charged at the following On-Demand rates:

Model	Price per 1,000 input tokens	Price per 1,000 output tokens
Llama 3 8B	$0.0004	$0.0006
Mistral 7B	$0.00015	$0.0002

Llama 3 8B vs Mistral 7B Bedrock pricing

The difference is significant. Mistral 7B is 62.5% less expensive than Llama 3 8B for input tokens and 66.7% less expensive than for output tokens. To put that into a real-world example, consider the following scenario.

Pricing Scenario: Llama 3 8B vs Mistral 7B

A company provides a text summarization service for news articles. They receive and process large volumes of short articles monthly to provide concise summaries for their users. Each article is 1,000 tokens on average and the summary of each article is 500 tokens on average. The company processes 200,000 articles per month. Calculations are as follows:

Llama 3 8B ($140)

Input cost = ($0.0004 x 1,000 tokens) / 1000 tokens = $0.0004
Output cost = ($0.0006 x 500 tokens) / 1000 tokens = $0.0003
Total = ($0.0004 input tokens + $0.0003 output tokens) x 200,000 articles = $140

Mistral 7B ($50)

Input cost = ($0.00015 x 1,000 tokens) / 1000 tokens = $0.00015
Output cost = ($0.0002 x 500 tokens) / 1000 tokens = $0.0001
Total = ($0.00015 input tokens + $0.0001 output tokens) x 200,000 articles = $50

For the same use case, Mistral 7B is 64.3% less expensive than Llama 3 8B.

Conclusion

As is often the case, there is a balance between performance and price. Llama 3 8B outperforms Mistral 7B on popular leaderboards and offers additional strengths, such as an extra billion parameters, while still retaining fast inference speed and broader language support. However, Mistral 7B remains a strong and lightweight model, providing excellent performance at over 60% less cost.