Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Emily Dunenfeld
Contents
OpenAI, Anthropic, and Google have all been competing to ship the most intelligent LLM, as evidenced by their multi-billion parameter, highly performant models (GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, respectively). However, as we mentioned in our previous small model pricing comparison blog, many use cases don’t require that kind of scale and power, that comes at a premium price.
Smaller and more cost-effective models excel at a wide range of general language tasks and can be more accessible for a broader set of applications and budgets. When it comes to choosing a model, leaderboards help greatly, however, they are constantly evolving and don’t always tell the full story, with each model finding its niche in specific tasks and industries. We will go over some common use cases of GPT 4-o mini, Claude 3 Haiku, and Gemini 1.5 Flash, model specifications, and compare the pricing.
The newly released (July 2024) GPT 4-o mini is OpenAI’s most cost-efficient small model. It is intended to replace GPT-3.5 Turbo as it is more performant at a lower cost. OpenAI states the recommended use cases as applications that require chaining or parallel execution of multiple model calls, processing large amounts of context (e.g., entire codebases or conversation histories), and real-time customer support.
Claude 3 Haiku is known for its incredible speed, cost, and text processing capabilities. Anthropic cites its ability to process and analyze 400 Supreme Court cases or 2,500 images for one dollar as an example of its text and image processing capabilities. It is recommended by Amazon for real-time customer support, translations, content moderation, optimized logistics, inventory management, and extraction from unstructured data.
Gemini 1.5 Flash has a massive one-million-token context window. Which, as Google points out corresponds to “one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.” Google lists use cases as information seeking, object recognition, and reasoning. It is available through Google AI Studio or Google Cloud Vertex AI.
GPT 4-o mini vs Claude 3 Haiku vs Gemini 1.5 Flash model
There is no clear winner from a performance or specification perspective as each model has its advantages and ideal use cases. However, some notable observations are that GPT 4-o mini has the highest MMLU score and Gemini 1.5 Flash stands apart with its huge context window.
These models represent the most cost-effective options in their respective families. To ensure a fair comparison, we’ll examine the pricing for each model through their primary enterprise-grade offerings, which provide additional security and features. For GPT 4-o mini, Azure’s OpenAI Service, for Claude 3 Haiku, Amazon Bedrock, and of course, for Gemini 1.5 Flash, Google (through Google AI Studio).
Here’s how the models stack up in terms of cost per 1,000 tokens in the US East region:
GPT 4-o mini vs Claude 3 Haiku vs Gemini 1.5 Flash price
Claude 3 Haiku is the most expensive out of the bunch, 66.67% more expensive for input tokens and 108.33% more expensive for output tokens compared to Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).
Gemini 1.5 Flash (for prompts under 128K tokens) is the least expensive option, at half the price of Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).
While each model has its strengths and may be more suitable for certain applications, the low pricing of Gemini 1.5 Flash, combined with the option of prompts up to a million tokens, make it an ideal choice for use cases where cost is a main consideration.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.
AWS is implementing a policy update that will no longer allow Reserved Instances and Savings Plans to be shared across end customers.