Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Emily Dunenfeld
Contents
Nikita Shamgunov, CEO of Neon - a serverless Postgres offering, recently opined on Elastic Block Store’s (EBS) place as a storage mechanism.
People should not use EBS when building cloud native infrastructure services. Too expensive pic.twitter.com/RhXxcfxVck— Nikita Shamgunov (@nikitabase) September 8, 2023
People should not use EBS when building cloud native infrastructure services. Too expensive pic.twitter.com/RhXxcfxVck
This encouraged us to do a deep dive into EBS vs. Non-Volatile Memory Express (NVMe) SSDs as storage solutions for cloud-native services hosted on Elastic Compute Cloud (EC2) instances. We found that NVMes have huge cost and performance benefits but come with a major drawback. NVMe storage is ephemeral, meaning it only retains data as long as its hosting instance is active. To take advantage of the cost savings, some architectural changes are needed. Namely, you need to replicate data across your instance store volumes. In this article, we’ll compare NVMe to EBS and delve into how replication can bridge the gap, offering you the best of both worlds.
Note: In this post we are referring to NVMe as local storage that is attached to the instance. When EBS is mounted on an instance, it uses the NVME protocol for I/O but this is network attached storage instead of local SSDs.
EBS is a block storage service in the AWS cloud that can be attached to your EC2 instances to provide additional storage. EBS provides persistent storage so that data is retained regardless of whether the EC2 instance is stopped, restarted, or terminated. This ensures your data is highly durable and available when you need it.
It’s worth briefly mentioning that there are four types of EBS volumes, each with different pricing and use cases. Two are SSD-backed intended for transactional workloads and two are HDD-backed intended for large sequential and throughput-intensive workloads.
Types of EBS volumes and their use cases.
NVMe is a protocol designed to optimize performance and minimize latency in storage devices, particularly SSDs. In the context of Amazon EC2 instances, certain instance types offer the capability to use NVMe SSD technology for their local storage solutions as an instance store volume. Instance store volumes are temporary, locally attached storage volumes that are physically connected to the host machine.
It offers high IOPS due to its direct connection to the host computer. It efficiently manages data with high-speed and low-latency access. However, NVMe instance store volumes are ephemeral, meaning data that is stored is not persistent and will be lost in cases where the EC2 instance is stopped or terminated. Therefore, NVMe volumes are typically best suited for temporary storage needs, such as caching, data processing, or high-performance computing tasks.
However, with the right strategies in place, NVMe SSDs can also be employed for persistent storage within cloud-native services. This adaptability allows cloud-native services to benefit from the high-speed, low-latency characteristics of NVMe SSDs, making them suitable for storing performance-critical data like databases, cache stores, and more.
As you can see, NVMe SSDs win in many ways but still lack the durability EBS provides. Soon, we will go over how replication can add durability to NVMe.
It’s important to understand what types of software can benefit from NVMe’s advantages given that it requires an extra amount of specialized engineering to implement data replication as described below. ClickHouse Cloud “uses S3 with a write-through cache on local SSDs” to take advantage of performance and cost savings. Percona, who’s founder Peter Zaitsev also chimed in on this topic, uses a instance-store-to-instance-store replication strategy to offer a high-performance MySQL variant.
Note that these are both scale-out databases which are available as cloud native services. Other services that can benefit from NVMes would include message queues, event streaming buses, caches, search indexes and so forth. These are all highly technical services. If instead you are building an application, EBS is likely still the right choice and would not require designing a replication strategy.
NVMe can provide significant savings compared to EBS. Let’s go over an example. Say we have a large 50 TB database. To pick an instance type consider the use case, performance requirements, and cost.
We also need to pick an EBS volume based on several factors. In this example, our database is non-sequential and will need to be frequently accessed so we can pick between General Purpose SSD and Provisioned IOPS SSD. Provisioned IOPS will have a better performance but will also have a higher price so let’s focus on General Purpose SSD, specifically gp3 for now.
The pricing is as follows (for a 1-year Compute Savings Plan with no upfront payment in the US East (N Virginia) region).
On top of its faster performance, NVMe is also much cheaper than EBS with the same amount of vCPUs, memory capacity, and network performance! Still, there is the problem of potentially lost data. That’s where replication comes into play.
Even though NVMe provides many speed and cost advantages, it is still limited because of its ephemeral storage. We can broaden its use cases and improve durability by using replication. Replication is the process of creating and maintaining copies of data to ensure data availability. With replication, even if your EC2 instance fails, there will still be a backup of your data.
Some things to consider are when you would like to replicate your data and pricing. The replication frequency depends on your specific use case and data change rate. If your data changes frequently, consider more frequent replication intervals to minimize potential data loss. Critical data with a low tolerance for data loss may require near-real-time replication.
One method of replicating NVMe data is with S3 (Simple Storage Service). S3 is a scalable and highly durable AWS cloud-based object storage service that can store huge amounts of data in buckets. To replicate NVMe data using S3 it’s important to choose a storage class considering your use case, access schedule, and price. In the case of the previous example, data will likely be accessed infrequently and in one availability zone so Amazon S3 One Zone-IA will be the best option. For an excellent article on how ClickHouse can be implemented to fallback to S3, see this post from DoubleCloud.
For a more exotic replication strategy, see this post from Percona where data is directly copied from instance to instance.
In conclusion, NVMe storage offers remarkable speed and cost advantages but lacks data persistence. Replication bridges this gap by ensuring data availability and durability. You can harness the power of NVMe while still safeguarding your data against unexpected failures, making it a valuable addition to your cloud-native services hosted on EC2 instances.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.
AWS is implementing a policy update that will no longer allow Reserved Instances and Savings Plans to be shared across end customers.