OpenAI, Anthropic, and Google have all been competing to ship the most intelligent LLM, as evidenced by their multi-billion parameter, highly performant models (GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, respectively). However, as we mentioned in our previous small model pricing comparison blog, many use cases don’t require that kind of scale and power, that comes at a premium price.
Smaller and more cost-effective models excel at a wide range of general language tasks and can be more accessible for a broader set of applications and budgets. When it comes to choosing a model, leaderboards help greatly, however, they are constantly evolving and don’t always tell the full story, with each model finding its niche in specific tasks and industries. We will go over some common use cases of GPT 4-o mini, Claude 3 Haiku, and Gemini 1.5 Flash, model specifications, and compare the pricing.
GPT 4-o Mini
The newly released (July 2024) GPT 4-o mini is OpenAI’s most cost-efficient small model. It is intended to replace GPT-3.5 Turbo as it is more performant at a lower cost. OpenAI states the recommended use cases as applications that require chaining or parallel execution of multiple model calls, processing large amounts of context (e.g., entire codebases or conversation histories), and real-time customer support.
Claude 3 Haiku
Claude 3 Haiku is known for its incredible speed, cost, and text processing capabilities. Anthropic cites its ability to process and analyze 400 Supreme Court cases or 2,500 images for one dollar as an example of its text and image processing capabilities. It is recommended by Amazon for real-time customer support, translations, content moderation, optimized logistics, inventory management, and extraction from unstructured data.
Gemini 1.5 Flash
Gemini 1.5 Flash has a massive one-million-token context window. Which, as Google points out corresponds to “one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.” Google lists use cases as information seeking, object recognition, and reasoning. It is available through Google AI Studio or Google Cloud Vertex AI.
Model Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash
Model | GPT 4-o Mini | Claude 3 Haiku | Gemini 1.5 Flash |
---|---|---|---|
Max Input Tokens | 128,000 | 200,000 | 1,000,000 |
Max Output Tokens | 16,384 | 4096 | 8,192 |
Parameters | 8 billion | Unknown | Unknown |
Training Data | Up to Oct 2023 | Up to Aug 2023 | Up to Nov 2023 |
Languages | Multilingual understanding, though languages are unspecified | English, Spanish, Japanese, and more | 100+ |
MMLU | 82 | 75.2 | 78.9 |
There is no clear winner from a performance or specification perspective as each model has its advantages and ideal use cases. However, some notable observations are that GPT 4-o mini has the highest MMLU score and Gemini 1.5 Flash stands apart with its huge context window.
Pricing Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash
These models represent the most cost-effective options in their respective families. To ensure a fair comparison, we’ll examine the pricing for each model through their primary enterprise-grade offerings, which provide additional security and features. For GPT 4-o mini, Azure’s OpenAI Service, for Claude 3 Haiku, Amazon Bedrock, and of course, for Gemini 1.5 Flash, Google (through Google AI Studio).
Here’s how the models stack up in terms of cost per 1,000 tokens in the US East region:
Model | Price per 1,000 Input Tokens | Price per 1,000 Output Tokens |
---|---|---|
Gemini 1.5 Flash (Prompts < 128K) | $0.000075 | $0.00030 |
Gemini 1.5 Flash (Prompts > 128K) | $0.000150 | $0.00060 |
GPT 4-o Mini (Global Deployment) | $0.000150 | $0.00060 |
GPT 4-o Mini (Regional API) | $0.000165 | $0.00066 |
Claude 3 Haiku | $0.000250 | $0.00125 |
Claude 3 Haiku is the most expensive out of the bunch, 66.67% more expensive for input tokens and 108.33% more expensive for output tokens compared to Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).
Gemini 1.5 Flash (for prompts under 128K tokens) is the least expensive option, at half the price of Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).
Conclusion
While each model has its strengths and may be more suitable for certain applications, the low pricing of Gemini 1.5 Flash, combined with the option of prompts up to a million tokens, make it an ideal choice for use cases where cost is a main consideration.
Monitor your AWS, Azure, and Google costs.