Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Danielle Vansia
Contents
An active resource is a resource, such as a virtual machine, that is currently generating costs within a cloud account. These resources come from a cloud provider, such as an Amazon EC2 instance or a Confluent cluster. It’s important to know which resources are active and generating costs; otherwise, you could run into instances where things like S3 buckets start generating ridiculously high costs—and next thing you know, you’re paying thousands in unexpected costs!
In this tutorial, we walk through how to use the Vantage API to explore how you can get insights from your active resources. We provide a script that lets you interact and view pivoted cost data across resource type, provider, region, and more. You can use these insights to understand where your organization is spending the most and what resources are currently generating the most costs. Consider this tutorial an introduction to even deeper analysis you can do with this API.
In this demo, you’ll work along with the provided Jupyter Notebook to retrieve your active resource costs from the Vantage API. You’ll use a few data visualization Python libraries to explore the data and make insights about your active resource costs.
This tutorial assumes you have a basic understanding of Python, Jupyter Notebooks, and making basic API calls. For Vantage, you’ll need at least one provider connected with active resources. You’ll also need a Vantage API token with READ and WRITE scopes enabled.
READ
WRITE
The following Python libraries are used in the demo, so you’ll want to be sure they are installed on your system:
import requests import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import os import time
You’ll also need Jupyter Notebook or the ability to read ipynb files on your system to review the Notebook.
ipynb
Open the Jupyter Notebook and walk through each section described below. The code is also reprinted here for explanation purposes.
resources
The resources endpoint returns a JSON array of all resources within a specific Resource Report or workspace. The resource_report_token variable represents the unique token for a Resource Report in Vantage. For this lab, use the All Active Resources report that’s automatically provided in your account.
resource_report_token
https://console.vantage.sh/resources/prvdr_rsrc_rprt_a12f345345bad1ac
prvdr_rsrc_rprt_a12f345345bad1ac
<TOKEN>
url = "https://api.vantage.sh/v2/resources" params = { "resource_report_token": "<TOKEN>", "include_cost": "true" }
Export your Vantage API token it as the VANTAGE_API_TOKEN environment variable within this session. When you run the below block, os will import the token as the vantage_token variable.
VANTAGE_API_TOKEN
os
vantage_token
vantage_token = os.getenv("VANTAGE_API_TOKEN") if vantage_token is None: raise ValueError("Set VANTAGE_API_TOKEN as an environment variable.") headers = { "accept": "application/json", "authorization": f"Bearer {vantage_token}" }
When you initially call the /resources endpoint, the response is paginated. In addition, the Vantage API has rate limits in place to limit multiple calls. For this endpoint, the response is limited to 20 calls per minute. In the initial response, you should see the number of total pages that contain your resource data.
/resources
{ "links": { "self": "https://api.vantage.sh/v2/resources?resource_report_token=prvdr_rsrc_rprt_a12f345345bad1ac_cost=true", "first": "https://api.vantage.sh/v2/resources?resource_report_token=prvdr_rsrc_rprt_a12f345345bad1ac&include_cost=true&page=1", "next": "https://api.vantage.sh/v2/resources?resource_report_token=prvdr_rsrc_rprt_a12f345345bad1ac&include_cost=true&page=2", "last": "https://api.vantage.sh/v2/resources?resource_report_token=prvdr_rsrc_rprt_a12f345345bad1ac&include_cost=true&page=100", "prev": null },
The following loop accounts for this rate-limiting and sets a delay between requests. It uses the X-RateLimit-Reset header to determine how long to wait before resuming requests to ensure that the rate limit is respected. If the rate limit is hit, the loop pauses for the specified time in X-RateLimit-Reset, allowing the process to continue, without interruption, once the rate limit resets. This loop also extracts and appends data from each page, moving through the pagination links in the "next" field until the final page is reached.
X-RateLimit-Reset
"next"
# Create a list to collect the data across all pages all_data = [] page = 1 # Loops through pagination to retrieve all pages while url: response = requests.get(url, headers=headers, params=params) if response.status_code != 200: print(f"Error: {response.status_code}") break data = response.json() all_data.extend(data["resources"]) url = data["links"].get("next") page += 1 # Handle rate limiting, as the API is limited to 20 requests per minute if response.headers.get("X-RateLimit-Remaining") == "0": reset_time = int(response.headers.get("X-RateLimit-Reset", 60)) print(f"Rate limit hit. Sleeping for {reset_time} seconds...") time.sleep(reset_time) else: time.sleep(1) # Add a pause between the requests
Note that retrieving this data may take a few minutes to process depending on the number of resources in your organization.
pandas
The /resources endpoint provides a resource record for each resource (identified by the Vantage token). Each unique token can have multiple entries, as cost is determined by the resource’s category. For example, the following resource has one record for Data Transfer costs and another for API Request costs:
token
category
Data Transfer
API Request
"resources": [ { "token": "prvdr_rsrc_1ba2e3aa45678f9f", "uuid": "arn:aws:kms:us-east-1:12345678901:key/1234ab0d-56a7-89a3-45ab-89ab45ab1e34", "type": "aws_cloudfront_distribution", "label": "1234ab0d-56a7-89a3-45ab-89ab45ab1e34", "metadata": null, "account_id": "12345678901", "billing_account_id": "12345678901", "provider": "aws", "region": "us-east-1", "costs": [ { "category": "Data Transfer", "amount": "0.0000899936" } ], "created_at": "2023-05-22T19:43:33.264Z" }, { "token": "prvdr_rsrc_1ba2e3aa45678f9f", "uuid": "arn:aws:kms:us-east-1:12345678901:key/1234ab0d-56a7-89a3-45ab-89ab45ab1e34", "type": "aws_cloudfront_distribution", "label": "1234ab0d-56a7-89a3-45ab-89ab45ab1e34", "metadata": null, "account_id": "12345678901", "billing_account_id": "12345678901", "provider": "aws", "region": "us-east-1", "costs": [ { "category": "API Request", "amount": "0.0000987564" } ], "created_at": "2023-05-22T19:43:33.264Z" }, ...
The pandas dataframe you’ll create next pulls in the 'uuid', 'type', 'provider', 'region', 'token', 'label', 'account_id' for each resource as a record. In addition, the amount and category parameters are nested under costs. The record_path accounts for this. The record_prefix adds cost_ in front of each nested column name.
'uuid', 'type', 'provider', 'region', 'token', 'label', 'account_id'
amount
costs
record_path
record_prefix
cost_
df = pd.json_normalize( all_data, record_path='costs', meta=['uuid', 'type', 'provider', 'region', 'token', 'label', 'account_id'], record_prefix='cost_' )
With the initial dataframe in place, convert cost_amount to a float so that you can accurately calculate total cost per resource type. The total_cost_df groups all tokens together to give a total cost per token.
cost_amount
float
total_cost_df
df['cost_amount'] = df['cost_amount'].astype(float) total_cost_df = df.groupby('token')['cost_amount'].sum().reset_index() total_cost_df = total_cost_df.sort_values(by='cost_amount', ascending=False)
With the data grouped and cleaned, you are now ready to make some visualizations and conduct some data analysis.
Now that you have the data, you can explore different visualizations using matplotlib.
matplotlib
This first visualization looks at the top 5 cost-contributing resource types across all providers. A new dataframe groups by type and sums the cost_amount for each type.
type
type_cost_df = df.groupby('type')['cost_amount'].sum().reset_index() # Create table for visual top_types = type_cost_df.sort_values(by='cost_amount', ascending=False).head(5) print(top_types) # Plot top resource types by cost plt.figure(figsize=(10, 6)) plt.bar(top_types['type'], top_types['cost_amount'], color='coral') plt.xlabel('Resource Type') plt.ylabel('Total Cost') plt.title('Top 5 Cost-Contributing Resource Types') plt.xticks(rotation=45) plt.show()
From the results, you should see a graph that looks something like the below graph. (Note that we’ve used all sample data for the presented images.) The Resource Type axis shows each resource identified by a nomenclature from the Vantage API. You can find the equivalent name for each resource type in the Vantage Documentation. For example, aws_ecs_service represents the ECS Service.
Resource Type
aws_ecs_service
Generated matplotlib graph of top resource costs using sample data
Create the following bar chart with matplotlib to see costs per region across all providers.
region_cost_df = df.groupby('region')['cost_amount'].sum().reset_index() # Create table for visual top_regions = region_cost_df.sort_values(by='cost_amount', ascending=False).head(5) print(top_regions) # Plot total cost by region plt.figure(figsize=(10, 6)) plt.bar(region_cost_df['region'], region_cost_df['cost_amount'], color='skyblue') plt.xlabel('Region') plt.ylabel('Total Cost') plt.title('Total Cost by Region') plt.xticks(rotation=45) plt.show()
The chart should look something like the below image. The region code is provided on the X-axis. This code will be specific to the related provider.
Generated matplotlib graph of top region costs using sample data
A heatmap can help show clusters of resources across providers. This heatmap uses matplotlib and seaborn and creates a pivot table of provider and resource type and includes the top 10 highest-costing resource types for each. Again, you could also consider creating a filtered dataframe to see this data for only a certain set of providers. This additional example is provided in the Jupyter Notebook.
seaborn
heatmap_data = df.pivot_table(values='cost_amount', index='provider', columns='type', aggfunc='sum').fillna(0) # Keep only the top 10 highest-cost types for readability top_types = df.groupby('type')['cost_amount'].sum().nlargest(10).index heatmap_data = heatmap_data[top_types] plt.figure(figsize=(16, 10)) sns.heatmap(heatmap_data, cmap='YlGnBu', annot=True, fmt=".4f", cbar_kws={'label': 'Total Cost'}) plt.xlabel('Type') plt.ylabel('Provider') plt.xticks(rotation=45, ha='right') plt.title('Total Cost Distribution by Provider and Top Resource Types') plt.show()
The heatmap should look something like the below image. Darker cells represent greater costs for that provider/resource type and show bigger clusters of data.
Generated matplotlib and seaborn heatmap showing clusters of costs across provider and resource type
It’s important to track and analyze your active resources to avoid unexpected spikes in your cloud costs. The /resources API in Vantage allows you to dig deep into which resources are costing you the most across provider, region, account, and more. Consider creating more in-depth charts or tables to identify other patterns in your resource use and costs.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.
AWS is implementing a policy update that will no longer allow Reserved Instances and Savings Plans to be shared across end customers.