The widespread rise of AI is causing a huge increase in GPU demand. Companies need GPUs (graphics processing units) for various ML (machine learning) tasks, such as data processing and complex computations, due to their extreme processing power. However, this massive surge in demand has led to extensive waitlists, sometimes spanning nearly a year. This scarcity poses a significant challenge, especially for smaller groups with limited purchasing power, like startups or research organizations.
Some have resorted to creative solutions to combat the shortage. In a notable example, one startup borrows GPUs through connections at large equipment vendors and contacts in quantitative stock trading firms. In this case, they only needed 64 GPUs for six-hour increments. Their story is familiar to other companies facing similar challenges. That’s where Amazon EC2 Capacity Blocks come into play, providing an alternative solution to navigating the GPU scarcity issue.
Amazon EC2 Capacity Blocks for ML
The new release of EC2 Capacity Blocks for ML aims to make GPU instances more accessible. With Capacity Blocks, you can reserve P5 instances by the number of instances (up to 64) and the duration (up to 14 days). P5 instances use NVIDIA H100 Tensor Core GPUs and are colocated in Amazon EC2 UltraClusters. NVIDIA’s GPUs lead and dominate the server market, holding 60–70% of the market share.
Instance | GPUs | vCPUs | Instance Memory (TiB) | GPU Memory | Network Bandwidth | GPUDirect RDMA | GPU Peer to Peer | Instance Storage (GB) | EBS Bandwidth (Gbps) |
---|---|---|---|---|---|---|---|---|---|
p5.48xlarge | 8 | 192 | 2 | 640 GB HBM3 | 3200 Gbps EFAv2 | Yes | 900 GB/s NVSwitch | 8 x 3.84 NVMe SSD | 80 |
P5 instances can be used in Generative AI applications for tasks such as question-answering, code generation, video and image generation, and speech recognition. P5 Capacity Blocks are well-suited for use cases such as training and fine-tuning ML models, prototyping, running experiments, and preparing for surges in demand for ML applications. In the previously mentioned scenario, the startup only needed 64 GPUs for six-hour increments, making them an excellent case of a company that would benefit from this plan.
Capacity Block Availability
You may have to be more flexible when searching for Capacity Blocks. They are currently limited to P5 instances in only the AWS US East (Ohio) Region. They are also only available up to 8 weeks in advance.
To reserve Capacity Blocks navigate to the “Capacity Reservations” section of the EC2 console or CLI. When searching for available Capacity Blocks you’ll need to specify:
- Number of Instances: 1, 2, 4, 8, 16, 32, or 64 instances.
- Duration: 1-14 days in one-day increments.
- Date Range: Earliest start and latest end dates.
Once you enter your specifications, you’ll see a list of available reservations meeting your criteria. It’s important to note that flexibility of the number of instances, duration, and date range will return more options.
Pricing of EC2 Capacity Blocks for ML
The cost of Capacity Blocks is dynamic depending on supply and demand. Capacity Blocks are a good fit for those who don’t need long-term instances because you only need to pay for the time frame they’re reserved. They’re also a more predictable option since you pay upfront for your reservation. Price is shown in ascending order in the list of the returned options within the console or CLI.
The pricing page doesn’t explicitly detail the specific cost range. However, Jake Siddall, a technical senior product manager at AWS, provided an in-depth discussion on pricing during an episode of “Under the Hood with AWS Compute” with Lorenzo Winfrey. Siddall explained that the range slightly varies above or below P5 On-Demand rates, with controls in place to prevent significant surges.
Another cost associated with Capacity Blocks is the price for operating system use while your instances are running. Note that Linux and Ubuntu Pro are charged per-second, while Red Hat Enterprise Linux, RHEL with HA, and SUSE Linux Enterprise Server are charged at a flat hourly rate.
Instance | Linux | Red Hat Enterprise Linux (RHEL) | RHEL with HA | SLES | Ubuntu Pro |
---|---|---|---|---|---|
p5.48xlarge | $0.000 USD | $0.130 USD | $0.165 USD | $0.125 USD | $0.336 USD |
Conclusion
Increased short-term availability at a cost in line with On-Demand P5 instances serves to make GPUs more accessible. Capacity Blocks are particularly useful for short-term workflows, enabling users to utilize powerful GPUs without long-term commitments. This enhanced accessibility makes it easier for companies to integrate AI into their projects and workflows, fostering adaptability and innovation in their respective fields.
Lower your AWS costs.