Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Vantage Team
Contents
Vantage: How do we access our OpenAI usage data through the API?
ChatGPT: You can access usage via the /usage endpoint.
/usage
That was how our OpenAI integration was born. OpenAI never documented this endpoint, so we think ChatGPT just made up the answer, but it turned out to be right. We fired off a GET request to https://api.openai.com/v1/usage?date=2023-4-14, and there it was:
GET
https://api.openai.com/v1/usage?date=2023-4-14
{ "object": "list", "data": [], "ft_data": [], "dalle_api_data": [], "whisper_api_data": [], "current_usage_usd": 0.0 }
Now we didn’t have any data, but we knew something was here. At Vantage, we want to help our customers understand their infrastructure bills in detail, and this seemed like a great opportunity to expose detailed cost data. At the time of writing, OpenAI’s Usage dashboard only displays the usage per day in dollars spent, number of requests per hour, and how many tokens were used per model. You can’t see show how much you’re spending on particular models or operations. This could lead to potentially large bills if you’re not careful.
Viewing usage costs in OpenAI's dashboard
We started off writing Ruby and cURL scripts that would simulate usage so we knew what the data looked like.
When we started out, we didn’t know what any of these fields meant. We asked ChatGPT, but we weren’t confident ChatGPT knew the answer either. Ultimately we were able to learn what each field meant, at least in the data field.
data
The Ruby script below was our starting point. It helped us simulate gpt-3.5-turbo usage by repeatedly calling the chat/completions operation. We ask the bot to talk about itself 10 times in each request. We also set a budget so that the script would not exceed a certain dollar amount.
chat/completions
require "faraday" secret_key = ENV["OPEN_AI_KEY"] client = Faraday.new(url: "https://api.openai.com/v1/") do |faraday| faraday.request(:authorization, "Bearer", secret_key) faraday.headers["Content-Type"] = "application/json" faraday.response(:json) end # GPT-3.5 pricing as of 04/2023: $0.002 per 1k tokens token_budget = (1_000 * 2.00) / 0.002 total_tokens = 0 until total_tokens > token_budget response = client.post( "chat/completions", { model: "gpt-3.5-turbo-0301", # Generate 10 completions at a time, so we rack up that usage bill a little more quickly. n: 10, # The lower the temperature, the faster the API response. temperature: 0.05, messages: [ { role: "system", content: "Imagine you are an extremely chatty person." }, { role: "user", content: "Rant away." } ], user: "bin/open-ai" }.to_json ) error = response.body["error"] if error puts("Error: #{error["message"]}") exit! else total_tokens += response.body["usage"]["total_tokens"] puts("Accrued $#{((total_tokens * 0.002) / 1_000).round(2)} (#{total_tokens} tokens) thus far...") end end
We let this script run a few cycles and we saw the usage response update:
{ "object": "list", "data": [ { "aggregation_timestamp": 1681220700, "n_requests": 8, "operation": "completion", "snapshot_id": "gpt-3.5-turbo-0301", "n_context": 8, "n_context_tokens_total": 208, "n_generated": 80, "n_generated_tokens_total": 18585 }, ], "ft_data": [], "dalle_api_data": [], "whisper_api_data": [], "current_usage_usd": 0.0 }
Now that our API is populated we need to transform these fields into data that Vantage can ingest.
What did all of these fields mean? We asked ChatGPT, but the answer was so confident that we weren’t sure if it was making things up again. We eventually figured it out:
aggregation_timestamp
n_requests
operation
completion
edit
embeddings
snapshot_id
n_context
n_context_tokens_total
n_generated
n_generated_tokens_total
As you can see, we’re able to calculate the cost by summing the n_context_tokens_total and the n_generated_tokens_total. OpenAI aggregates data in 5 minute increments and will conveniently sum up all the usage in that time period. The OpenAI pricing page provides costs for each model.
But how do you verify that you arrive at the right answer? Sadly, that isn’t possible. The current_usage_usd is not implemented as of writing this blog post, and we weren’t able to find out from anyone at OpenAI if or when it would be usable. We can’t verify the cost data because OpenAI will always return 0 for the current_usage_usd of that day. We were expecting it to return the total dollar amount for the day. We were able to get very close on all of our customers’ accounts and matched it exactly on our own account.
current_usage_usd
0
We relied on cURL to simulate usage data for operations like Image generation and Speech to text. We did this because OpenAI does not have an official Ruby gem for their API, and the unofficial gems we tried didn’t work.
# Generate images function generate_images() { for i in {1..10}; do curl https://api.openai.com/v1/images/generations \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPEN_AI_KEY" \ -d '{ "prompt": "A cute baby seal", "n": 1, "size": "1024x1024" }' done }
Once we had sample usage data, we started building out the ETL pipeline. The crux of the work was transforming on the costs. We eventually settled on a regex pattern to parse out base models and calculated the cost from the operation and model. There are some other gotchas to watch out for, like how GPT-4 charges separately for the context and completion tokens.
Most operations will follow a similar pattern, but for image generation, audio translation and transcription, and fine-tuned training, we see the following sample responses:
{ "object": "list", "data": [], "ft_data": [ { "created_at": 1681412358, "trained_tokens": 132, "base_model": "ada", "fine_tuned_snapshot_ids": [ "ada:ft-vantage-2023-04-13-19-16-22" ] } ], "dalle_api_data": [ { "timestamp": 1681399020, "num_images": 16, "num_requests": 8, "image_size": "1024x1024", "operation": "generations" } ], "whisper_api_data": [ { "timestamp": 1681399920, "model_id": "whisper-1", "num_seconds": 32, "num_requests": 1 } ], "current_usage_usd": 0.0 }
The cost calculations are straightforward here. OpenAI charges by the image, the minute, and the trained tokens for DALL-e, Whisper, and GPT3 models respectively.
And now here’s what it looks like once we’ve generated a cost report in Vantage:
Viewing OpenAI costs on a Cost Report
We’re excited about the possibilities of OpenAI’s API and we’re looking forward to saving our customers money on their bills. If you’re interested in trying out OpenAI’s API, you can sign up for an account here.
RDS Extended Support allows customers to continue receiving security updates for older database versions, but it comes at a significant hourly per-vCPU cost that increases over time.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.