Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Emily Dunenfeld
Contents
In the race to develop the best generative AI model, models with billions of parameters, like GPT-4 and Claude 3, are the most powerful. However, sometimes you don’t need the full capabilities of such large models, which also carry a higher price tag. Small language models are more affordable options that may work better for your use case. Among these models, Llama 3 8B was recently introduced and outperforms Mistral 7B, which was previously widely chosen as the go-to small model, on popular leaderboards.
While leaderboard rankings are a useful metric, they don’t tell the full story and it’s essential to consider other factors such as training data, availability, and pricing. Both Llama 3 8B and Mistral 7B are available locally, as well as through multiple platforms, including Amazon Bedrock as managed services, which we will focus on in this comparison.
Llama 3 8B is Meta’s 8-billion parameter language model that was released in April 2024. It is an improvement to the previous generation, Llama 2, with the data training set being seven times as large, with a more significant emphasis on code. The model is well-suited for various use cases, such as text summarization and classification, sentiment analysis, and language translation.
Mistral 7B is a dense transformer model that strikes a balance between performance and cost efficiency. Released in September 2023, Mistral 7B has been a popular choice for those seeking a smaller, more affordable language model. Use cases include text summarization and structuration, question-answering, and code completion.
Llama 3 8B outranks Mistral 7B in popular leaderboards, but there are other factors to consider, such as:
Pricing through Amazon Bedrock is charged at the following On-Demand rates:
Llama 3 8B vs Mistral 7B Bedrock pricing
The difference is significant. Mistral 7B is 62.5% less expensive than Llama 3 8B for input tokens and 66.7% less expensive than for output tokens. To put that into a real-world example, consider the following scenario.
A company provides a text summarization service for news articles. They receive and process large volumes of short articles monthly to provide concise summaries for their users. Each article is 1,000 tokens on average and the summary of each article is 500 tokens on average. The company processes 200,000 articles per month. Calculations are as follows:
Input cost = ($0.0004 x 1,000 tokens) / 1000 tokens = $0.0004 Output cost = ($0.0006 x 500 tokens) / 1000 tokens = $0.0003 Total = ($0.0004 input tokens + $0.0003 output tokens) x 200,000 articles = $140
Input cost = ($0.00015 x 1,000 tokens) / 1000 tokens = $0.00015 Output cost = ($0.0002 x 500 tokens) / 1000 tokens = $0.0001 Total = ($0.00015 input tokens + $0.0001 output tokens) x 200,000 articles = $50
For the same use case, Mistral 7B is 64.3% less expensive than Llama 3 8B.
As is often the case, there is a balance between performance and price. Llama 3 8B outperforms Mistral 7B on popular leaderboards and offers additional strengths, such as an extra billion parameters, while still retaining fast inference speed and broader language support. However, Mistral 7B remains a strong and lightweight model, providing excellent performance at over 60% less cost.
Monitor your AWS costs.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.
AWS is implementing a policy update that will no longer allow Reserved Instances and Savings Plans to be shared across end customers.
Use FinOps as Code to explore all your active cost-generating provider resources.