OpenAI, Anthropic, and Google have all been competing to ship the most intelligent LLM, as evidenced by their multi-billion parameter, highly performant models (GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, respectively). However, as we mentioned in our previous small model pricing comparison blog, many use cases don’t require that kind of scale and power, that comes at a premium price.

Smaller and more cost-effective models excel at a wide range of general language tasks and can be more accessible for a broader set of applications and budgets. When it comes to choosing a model, leaderboards help greatly, however, they are constantly evolving and don’t always tell the full story, with each model finding its niche in specific tasks and industries. We will go over some common use cases of GPT 4-o mini, Claude 3 Haiku, and Gemini 1.5 Flash, model specifications, and compare the pricing.

GPT 4-o Mini

The newly released (July 2024) GPT 4-o mini is OpenAI’s most cost-efficient small model. It is intended to replace GPT-3.5 Turbo as it is more performant at a lower cost. OpenAI states the recommended use cases as applications that require chaining or parallel execution of multiple model calls, processing large amounts of context (e.g., entire codebases or conversation histories), and real-time customer support.

Claude 3 Haiku

Claude 3 Haiku is known for its incredible speed, cost, and text processing capabilities. Anthropic cites its ability to process and analyze 400 Supreme Court cases or 2,500 images for one dollar as an example of its text and image processing capabilities. It is recommended by Amazon for real-time customer support, translations, content moderation, optimized logistics, inventory management, and extraction from unstructured data.

Gemini 1.5 Flash

Gemini 1.5 Flash has a massive one-million-token context window. Which, as Google points out corresponds to “one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.” Google lists use cases as information seeking, object recognition, and reasoning. It is available through Google AI Studio or Google Cloud Vertex AI.

Model Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash

Model GPT 4-o Mini Claude 3 Haiku Gemini 1.5 Flash
Max Input Tokens 128,000 200,000 1,000,000
Max Output Tokens 16,384 4096 8,192
Parameters 8 billion Unknown Unknown
Training Data Up to Oct 2023 Up to Aug 2023 Up to Nov 2023
Languages Multilingual understanding, though languages are unspecified English, Spanish, Japanese, and more 100+
MMLU 82 75.2 78.9

GPT 4-o mini vs Claude 3 Haiku vs Gemini 1.5 Flash model

There is no clear winner from a performance or specification perspective as each model has its advantages and ideal use cases. However, some notable observations are that GPT 4-o mini has the highest MMLU score and Gemini 1.5 Flash stands apart with its huge context window.

Pricing Comparison: GPT 4-o Mini vs Claude 3 Haiku vs Gemini 1.5 Flash

These models represent the most cost-effective options in their respective families. To ensure a fair comparison, we’ll examine the pricing for each model through their primary enterprise-grade offerings, which provide additional security and features. For GPT 4-o mini, Azure’s OpenAI Service, for Claude 3 Haiku, Amazon Bedrock, and of course, for Gemini 1.5 Flash, Google (through Google AI Studio).

Here’s how the models stack up in terms of cost per 1,000 tokens in the US East region:

Model Price per 1,000 Input Tokens Price per 1,000 Output Tokens
Gemini 1.5 Flash (Prompts < 128K) $0.000075 $0.00030
Gemini 1.5 Flash (Prompts > 128K) $0.000150 $0.00060
GPT 4-o Mini (Global Deployment) $0.000150 $0.00060
GPT 4-o Mini (Regional API) $0.000165 $0.00066
Claude 3 Haiku $0.000250 $0.00125

GPT 4-o mini vs Claude 3 Haiku vs Gemini 1.5 Flash price

Claude 3 Haiku is the most expensive out of the bunch, 66.67% more expensive for input tokens and 108.33% more expensive for output tokens compared to Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).

Gemini 1.5 Flash (for prompts under 128K tokens) is the least expensive option, at half the price of Gemini 1.5 Flash (for prompts over 128K tokens) and GPT 4-o mini (global deployment).

Conclusion

While each model has its strengths and may be more suitable for certain applications, the low pricing of Gemini 1.5 Flash, combined with the option of prompts up to a million tokens, make it an ideal choice for use cases where cost is a main consideration.