Content • Aug 9, 2024

GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash: Small Language Model Pricing Considerations

by Emily Dunenfeld

Gemini 1.5 Flash offers the best balance of affordability and capabilities out of the small language models.

OpenAI, Anthropic, and Google have all been competing to ship the most intelligent LLM, as evidenced by their multi-billion parameter, highly performant models (GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, respectively). However, as we mentioned in our previous small model pricing comparison blog, many use cases don’t require that kind of scale and power, that comes at a premium price.

Smaller and more cost-effective models excel at a wide range of general language tasks and can be more accessible for a broader set of applications and budgets. When it comes to choosing a model, leaderboards help greatly, however, they are constantly evolving and don’t always tell the full story, with each model finding its niche in specific tasks and industries. We will go over some common use cases of GPT 4-o mini, Claude 3 Haiku, and Gemini 1.5 Flash, model specifications, and compare the pricing.

GPT 4-o Mini

The newly released (July 2024) GPT 4-o mini is OpenAI’s most cost-efficient small model. It is intended to replace GPT-3.5 Turbo as it is more performant at a lower cost. OpenAI states the recommended use cases as applications that require chaining or parallel execution of multiple model calls, processing large amounts of context (e.g., entire codebases or conversation histories), and real-time customer support.

Claude 3 Haiku

Claude 3 Haiku is known for its incredible speed, cost, and text processing capabilities. Anthropic cites its ability to process and analyze 400 Supreme Court cases or 2,500 images for one dollar as an example of its text and image processing capabilities. It is recommended by Amazon for real-time customer support, translations, content moderation, optimized logistics, inventory management, and extraction from unstructured data.

Gemini 1.5 Flash

Gemini 1.5 Flash has a massive one-million-token context window. Which, as Google points out corresponds to “one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.” Google lists use cases as information seeking, object recognition, and reasoning. It is available through Google AI Studio or Google Cloud Vertex AI.

Model Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash

Model	GPT 4-o Mini	Claude 3 Haiku	Gemini 1.5 Flash
Max Input Tokens	128,000	200,000	1,000,000
Max Output Tokens	16,384	4096	8,192
Parameters	8 billion	Unknown	Unknown
Training Data	Up to Oct 2023	Up to Aug 2023	Up to Nov 2023
Languages	Multilingual understanding, though languages are unspecified	English, Spanish, Japanese, and more	100+
MMLU	82	75.2	78.9

GPT 4-o mini vs Claude 3 Haiku vs Gemini 1.5 Flash model

There is no clear winner from a performance or specification perspective as each model has its advantages and ideal use cases. However, some notable observations are that GPT 4-o mini has the highest MMLU score and Gemini 1.5 Flash stands apart with its huge context window.

Pricing Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash

These models represent the most cost-effective options in their respective families. To ensure a fair comparison, we’ll examine the pricing for each model through their primary enterprise-grade offerings, which provide additional security and features. For GPT 4-o mini, Azure’s OpenAI Service, for Claude 3 Haiku, Amazon Bedrock, and of course, for Gemini 1.5 Flash, Google (through Google AI Studio).

Here’s how the models stack up in terms of cost per 1,000 tokens in the US East region:

Model	Price per 1,000 Input Tokens	Price per 1,000 Output Tokens
Gemini 1.5 Flash (Prompts < 128K)	$0.000075	$0.00030
Gemini 1.5 Flash (Prompts > 128K)	$0.000150	$0.00060
GPT 4-o Mini (Global Deployment)	$0.000150	$0.00060
GPT 4-o Mini (Regional API)	$0.000165	$0.00066
Claude 3 Haiku	$0.000250	$0.00125

GPT 4-o mini vs Claude 3 Haiku vs Gemini 1.5 Flash price

Claude 3 Haiku is the most expensive out of the bunch, 66.67% more expensive for input tokens and 108.33% more expensive for output tokens compared to Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).

Gemini 1.5 Flash (for prompts under 128K tokens) is the least expensive option, at half the price of Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).

Conclusion

While each model has its strengths and may be more suitable for certain applications, the low pricing of Gemini 1.5 Flash, combined with the option of prompts up to a million tokens, make it an ideal choice for use cases where cost is a main consideration.

Cost Reporting

Kubernetes

Virtual Tagging

Network Flow Reports

Cost Allocation

Budgeting

Resource Reports

Usage-Based Reporting

Anomaly Detection

Autopilot for Savings Plans

Cost Recommendations

Commitment Reports

Unit Costs

Savings Planner

Issues

Team Access

Terraform

Jira

MCP

AWS

Azure

Google Cloud

Oracle Cloud

Kubernetes

Datadog

Snowflake

Fastly

MongoDB

Databricks

New Relic

Confluent

PlanetScale

Coralogix

GitHub

Linode

OpenAI

Grafana Cloud

ClickHouse Cloud

Temporal Cloud

Twilio

Custom Providers

About

Blog

Customers

Podcasts & Talks

Newsroom

Events

Slack Community

EC2Instances.info

cur.vantage.sh

Cloud Cost Reports

FOCUS Converter

Cloud Cost Handbook

Cloud Cost Leaderboard

Product Changelog

Vantage University

API Documentation

MSPs

Partnerships

Our Partners

GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash: Small Language Model Pricing Considerations

GPT 4-o Mini

Claude 3 Haiku

Gemini 1.5 Flash

Model Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash

Pricing Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash

Conclusion

More from the blog

Features

Documentation

Community

Company