EBS vs NVMe: Don’t Use EBS for Cloud Native Services

by Emily Dunenfeld


Nikita Shamgunov, CEO of Neon - a serverless Postgres offering, recently opined on Elastic Block Store’s (EBS) place as a storage mechanism.

This encouraged us to do a deep dive into EBS vs. Non-Volatile Memory Express (NVMe) SSDs as storage solutions for cloud-native services hosted on Elastic Compute Cloud (EC2) instances. We found that NVMes have huge cost and performance benefits but come with a major drawback. NVMe storage is ephemeral, meaning it only retains data as long as its hosting instance is active. To take advantage of the cost savings, some architectural changes are needed. Namely, you need to replicate data across your instance store volumes. In this article, we’ll compare NVMe to EBS and delve into how replication can bridge the gap, offering you the best of both worlds.

Note: In this post we are referring to NVMe as local storage that is attached to the instance. When EBS is mounted on an instance, it uses the NVME protocol for I/O but this is network attached storage instead of local SSDs.

What Is EBS?

EBS is a block storage service in the AWS cloud that can be attached to your EC2 instances to provide additional storage. EBS provides persistent storage so that data is retained regardless of whether the EC2 instance is stopped, restarted, or terminated. This ensures your data is highly durable and available when you need it.

It’s worth briefly mentioning that there are four types of EBS volumes, each with different pricing and use cases. Two are SSD-backed intended for transactional workloads and two are HDD-backed intended for large sequential and throughput-intensive workloads.

Volume Recommended Use
General Purpose SSD (gp3 and gp2) Virtual desktops, moderately sized individual databases, latency-critical interactive software, system startup drives, and developmental/testing environments.
Provisioned IOPS SSD (io2 and io1) Most extensive and highest Input/Output Operations Per Second (IOPS) demanding implementations of NoSQL and relational databases for mission-critical purposes
Throughput Optimized HDD (st1) Large, sequential, frequently accessed, and throughput-intensive workloads such as big data, data warehouses, and log processing.
Cold HDD (sc1) Workloads with throughput-oriented storage that are accessed less frequently

Types of EBS volumes and their use cases.

What are NVMe SSDs?

NVMe is a protocol designed to optimize performance and minimize latency in storage devices, particularly SSDs. In the context of Amazon EC2 instances, certain instance types offer the capability to use NVMe SSD technology for their local storage solutions as an instance store volume. Instance store volumes are temporary, locally attached storage volumes that are physically connected to the host machine.

It offers high IOPS due to its direct connection to the host computer. It efficiently manages data with high-speed and low-latency access. However, NVMe instance store volumes are ephemeral, meaning data that is stored is not persistent and will be lost in cases where the EC2 instance is stopped or terminated. Therefore, NVMe volumes are typically best suited for temporary storage needs, such as caching, data processing, or high-performance computing tasks.

However, with the right strategies in place, NVMe SSDs can also be employed for persistent storage within cloud-native services. This adaptability allows cloud-native services to benefit from the high-speed, low-latency characteristics of NVMe SSDs, making them suitable for storing performance-critical data like databases, cache stores, and more.

EBS vs NVMe Feature Comparison

Feature EBS NVMe
Availability Zone Support Supported Not supported
Bandwidth Varies by EBS volume type but even the fastest EBS volumes typically have lower bandwidth compared to NVMe SSDs High
Encryption Yes Yes
IOPS Varies by EBS volume type High
Latency Higher due to network Lower due to direct host connection
Persistence Data persists independently Data persists on the instance
Scalable Yes Yes
Snapshots Supported at additional cost Not supported

As you can see, NVMe SSDs win in many ways but still lack the durability EBS provides. Soon, we will go over how replication can add durability to NVMe.

Cloud Native Services

It’s important to understand what types of software can benefit from NVMe’s advantages given that it requires an extra amount of specialized engineering to implement data replication as described below. ClickHouse Cloud “uses S3 with a write-through cache on local SSDs” to take advantage of performance and cost savings. Percona, who’s founder Peter Zaitsev also chimed in on this topic, uses a instance-store-to-instance-store replication strategy to offer a high-performance MySQL variant.

Note that these are both scale-out databases which are available as cloud native services. Other services that can benefit from NVMes would include message queues, event streaming buses, caches, search indexes and so forth. These are all highly technical services. If instead you are building an application, EBS is likely still the right choice and would not require designing a replication strategy.

EBS vs NVMe Price Comparison

NVMe can provide significant savings compared to EBS. Let’s go over an example. Say we have a large 50 TB database. To pick an instance type consider the use case, performance requirements, and cost.

We also need to pick an EBS volume based on several factors. In this example, our database is non-sequential and will need to be frequently accessed so we can pick between General Purpose SSD and Provisioned IOPS SSD. Provisioned IOPS will have a better performance but will also have a higher price so let’s focus on General Purpose SSD, specifically gp3 for now.

The pricing is as follows (for a 1-year Compute Savings Plan with no upfront payment in the US East (N Virginia) region).

Metric EBS NVMe
Instance Type r4.16xlarge i3.16xlarge
vCPUs 64 64
Memory 488 GiB 488 GiB
Network Performance 20 Gigabit 20 Gigabit
Total Monthly Cost $6,352.58 $2,876.93

On top of its faster performance, NVMe is also much cheaper than EBS with the same amount of vCPUs, memory capacity, and network performance! Still, there is the problem of potentially lost data. That’s where replication comes into play.

Replicating NVMe

Even though NVMe provides many speed and cost advantages, it is still limited because of its ephemeral storage. We can broaden its use cases and improve durability by using replication. Replication is the process of creating and maintaining copies of data to ensure data availability. With replication, even if your EC2 instance fails, there will still be a backup of your data.

Some things to consider are when you would like to replicate your data and pricing. The replication frequency depends on your specific use case and data change rate. If your data changes frequently, consider more frequent replication intervals to minimize potential data loss. Critical data with a low tolerance for data loss may require near-real-time replication.

One method of replicating NVMe data is with S3 (Simple Storage Service). S3 is a scalable and highly durable AWS cloud-based object storage service that can store huge amounts of data in buckets. To replicate NVMe data using S3 it’s important to choose a storage class considering your use case, access schedule, and price. In the case of the previous example, data will likely be accessed infrequently and in one availability zone so Amazon S3 One Zone-IA will be the best option. For an excellent article on how ClickHouse can be implemented to fallback to S3, see this post from DoubleCloud.

For a more exotic replication strategy, see this post from Percona where data is directly copied from instance to instance.

Conclusion

In conclusion, NVMe storage offers remarkable speed and cost advantages but lacks data persistence. Replication bridges this gap by ensuring data availability and durability. You can harness the power of NVMe while still safeguarding your data against unexpected failures, making it a valuable addition to your cloud-native services hosted on EC2 instances.