Datadog Cost Optimization Tips

In the same way that AWS is the market leader for cloud infrastructure, Datadog is nearly as ubiquitous for monitoring and tracing software. While Datadog's ease of use and ability to scale makes it simple to get up and running with, we've heard from numerous customers that Datadog bills are among the hardest to monitor, attribute and optimize.

As Datadog customers ourselves, and now providers of Datadog costs, we have collected several techniques for controlling Datadog costs which we've organized into this post. Among them:

Committed Use Discounts
Disable Datadog Containerized Agent Logging
Controlling ingestion volume
Carefully selecting Custom Metrics
Setting an appropriate logging level
Bonus: Proxy through PrivateLink to Reduce Egress Charges

Before diving into the cost optimization tips, we'll lightly review Datadog's pricing model which you're free to skip over if you're already familiar.

Reviewing Datadog's Pricing Model

Datadog offers 18 services as of this blog post being published which covers most DevOps use cases. The big 5 are: Infrastructure Monitoring, Log Management, APM (Application Performance Monitoring), Database Monitoring, and Synthetic Monitoring (downtime detector). Each service is priced slightly differently but a few common threads emerge.

Dimension	Service
Per host	Infrastructure, APM, Database Monitoring, Network Monitoring, Cloud Security Management, Application Security Management, Cloud Cost Management
Per GB of data	Log Management, Observability Pipelines, Sensitive Data Scanner, Cloud SIEM
Per test run/function/session	Synthetic Monitoring, Continuous Testing, Serverless Monitoring, Session Replay
Per user	Incident Management, CI Visibility
By spend	Sensitive Data Scanner

Categorizing Datadog's services by how they are priced

Most of the services with "per host" pricing come with an allotted number of containers and metrics that can be monitored and queried. This is important because exceeding the number of metrics or containers for a given level can result in overages.

Committed Use Discounts

There is an easy way to get an immediate discount before diving into Datadog agent or infrastructure configuration changes. Datadog offers monthly minimum usage commitments for at least the following services:

Infrastructure
Log Management
APM
Database Monitoring
Cloud Security Management

With these commitments you can realize 20-50% savings from your variable usage plans. Note that variable usage plans are still billed annually but a minimum commitment will result in greater savings. Datadog does not publicly share all of the rates for minimum commitments but we do have some examples we've found.

For example, take container monitoring which states the following:

Additional containers will be billed at $0.002 per container per hour. In addition, you can purchase prepaid containers at $1 per container per month.

In a month that has 744 hours, the "on-demand" cost of a container will be $1.488 whereas a committed container would be $1.00 which represents a 32.8% discount. This can add up to be substantial savings.

To make these commitments you will need to understand the level of usage of various Datadog services. By creating a Datadog Cost Report, you will help level set the conversation with your account manager. Addtionally, Vantage is adding automated cost recommendations to help advise you on these potential savings.

Disable Datadog Containerized Agent Logs

Did you know that, when collecting Kubernetes or Docker logs in a default configuration, the Datadog agent collects logs for tracking its own performance that you're ultimately billed for? Most people don't.

When first configuring Datadog to collect logs, make sure Datadog Agent logs are disabled from being ingested. Datadog calls this out in its example Docker command line instructions and has a similar command for using Kubernetes without Docker. Disabling these at the ingestion level is good but ideally you will turn them off at the agent level to cut down on transit costs as well.

In your Datadog configurations, you should see one of these two settings:

DD_CONTAINER_EXCLUDE = "name:dd-agent"

For DaemonSets:

name: DD_CONTAINER_EXCLUDE_LOGS
      value: "name:datadog-agent"

We've heard from multiple customers that making this slight change when this exclusion was not previously set saves a substantive amount on their overall Datadog costs.

Set Ingestion Volume Controls

If you're a customer of APM, 150 GB of logs and 1 million indexed spans (averaged) are included across all hosts. Overage charges can quickly add up depending your scale. There are two tools to help you here: (1) Ingestion Controls and (2) Retention Filters.

Datadog gives you the ability to set ingestion controls so that only the most relevant traces are sent from the application. Once these traces arrive, you can further delineate how long each indexed span should be retained with retention filters. Both of these knobs will help in avoiding overages on APM.

Be Selective with Custom Metrics

Custom Metrics are described in the docs like this:

If a metric is not submitted from one of the more than 600 Datadog integrations it’s considered a custom metric.

On the Pro plan you get 100 custom metrics per host which can add up quickly. For example, if you have 20 API endpoints on a host which report 5 different HTTP response codes that would be 100 custom metrics. The "5" codes is referred to as the "cardinality" of the metric. Metrics with high cardinality can cause the number of custom metrics in a Datadog installation to balloon which results in overages.

Metrics pool across hosts. So for 10 monitored hosts there are 1000 custom metrics which are allotted. By ensuring that only certain hosts are configured for custom metrics, you can do things like allocate 400 custom metrics to 2 hosts and 200 to the other 8 hosts (like build or staging servers). As you streamline your usage of custom metrics across your Datadog account, you can remove unused tags or be sure that when you add tags you do so at a higher level of granularity than the existing tags. For example, adding a state tag to a set of metrics which is already tracked at the city level will not count towards your custom metric allotment.

Use an INFO Logging Level or Higher

There are are a variety of application-specific settings that you can adjust that impact Datadog bills.

When it comes to enabling logging, every log entry counts and Datadog will happily ingest each one. To avoid sudden spikes in logging costs, be sure that the log level for the application is set appropriately. In Ruby on Rails, adjusting the log level can be as simple as changing the following line:

config.log_level = :warn

A log level of info would also be appropriate, but for most applications a level of debug would quickly fill capacity in your Datadog plan. Many teams will have a post-deploy hook which checks for this, but you could also set anomaly detection for Datadog to be sure.

Proxy Through PrivateLink to Reduce Egress Charges

Datadog's built-in billing is not the only charge associated with using it as there are interacting costs between your infrastructure provider and Datadog's services. Sending large volumes of logs through NAT Gateways or public internet access points on other clouds will incur data egress fees. For AWS customers, you can instead proxy through PrivateLink so that log ingestion happens through an internal endpoint. The data transfer savings here can be in excess of 80%.

Conclusion: Telemetry and Total Cost Control

To get maximum use out of Datadog's platform while controlling costs, we recommend following the practices above. Have more thoughts? We'd love to hear them! To share more tips on Datadog cost savings, please join us in the Vantage Slack.