Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Emily Dunenfeld
Contents
One of the most exciting 2024 re:Invent announcements was Amazon S3 Tables. Designed to simplify analytics on S3, S3 Tables provide a fully managed way to store structured data using Apache Iceberg. This announcement has even led to speculation that S3 is becoming a fully managed data lakehouse solution that will put Snowflake and Databricks out of business, however, the reality is more nuanced.
S3 Tables introduces a new type of S3 bucket that stores structured data in the Apache Parquet format and manages tables using the Apache Iceberg format. They were introduced to improve performance for analytics workloads, as querying from S3 Standard storage sometimes causes performance bottlenecks in the form of transaction limits or unexpected cost increases.
A little backstory—for analytics use cases relying on S3 Standard storage, users have traditionally set up table formatting manually using Hive. However, Hive lacks certain features and optimizations, making it a less-than-ideal choice. As a result, more S3 Standard users started adopting Iceberg, which was designed to solve some of Hive’s problems with advantages including improved query performance, time travel, schema evolution, ACID transactions, and lower costs.
Still, using Iceberg with S3 Standard requires you to set up and manage tables, which corresponds to engineering effort and associated costs. S3 Tables manages that work for you, which is why some are calling it just a managed Iceberg Service.
S3 Tables come with benefits such as:
Pricing for S3 Tables is all and all not bad. Charges apply for storage, including per object monitored, requests, and optional maintenance fees for compaction.
We’ll review and compare them to S3 Standard for additional perspective. Prices are for the US East N Virginia region.
Storage is billed on a tiered plan, with discounted rates for more storage. For TB stored across all tiers, S3 Tables is about 15% more expensive than S3 Standard for storage. Monitoring costs are billed at the same rates as S3 Intelligent-Tiering’s monitoring costs.
Charges for API requests apply as well, but are charged at the same rates as other S3 buckets.
Finally, charges apply for compaction at the following rates. This is where spectators have begun to worry about cost. However, as others have pointed out, although AWS does not share their compaction algorithm, we can make an educated assumption that AWS does not compact an object more.
An S3 Table has 10 TB of data stored. The average size of an object is 1 GB. There are 1 million PUT requests and 10 million GET requests that month. The approximate costs are as follows:
10 TB data = 10,000 GB data
10 TB data / 1 GB average object size = 10,000 objects
Storage Costs: 10,000 GB data x $0.0265 per GB = $265.00
Monitoring Costs: 10,000 objects x ($0.025 / 1,000 objects) = $0.25
API Request Costs: 1,000,000 PUT requests x ($0.005 / 1,000 requests) + 1,000,000 GET requests x ($0.004 / 1,000 requests) = $9
Assuming objects are only compressed once estimate 10% of objects a month are compressed.
Compaction Costs Objects: (10,000 objects x 10%) x ($0.004 / 1,000 objects) = $0.004
Compaction Costs Data Processed: (10,000 GB x 10%) x $0.05 per GB = $50
Total: $265.00 storage + $0.25 monitoring + $9 API + $0.004 compaction objects + $50 compaction data processed = $324.25
In S3 Standard the cost would be $239, making S3 Tables 36% more expensive in this scenario. It is up to you if that cost is justified for not having to maintain the tables yourself. See the next section for information on when to use or not use S3 Tables.
S3 Tables are recommended by AWS for analytics workloads that are optimized for tabular data, for example daily purchase transactions, streaming sensor data, ad impressions, real-time streaming, change data capture (CDC), and log analysis.
For cases where you’re querying from S3 Standard storage using a different data format like Hive and are running into slowness or high costs S3 Tables may be an easier solution than managing and maintaining it yourself. However, if you’re already using Iceberg with S3 Standard and have an efficient enough compacting method, it may not be worth the engineering effort or additional cost to make the switch.
One con is there may be some degree of vendor lock in with this implementation, as S3 Tables are not fully open in the way that traditional Lake House architectures are since they are not open storage and require AWS-specific APIs and third-party connectors for external integration.
Another thing to note is as of now, S3 Tables are not yet available in all regions, and integration with AWS Glue is still in preview, which may limit its usability.
With all that in mind, it may be a disruptor to Snowflake and Databricks, but is nowhere near putting them out of business.
As AWS continues to expand regional availability and integration with services like AWS Glue moves beyond preview, S3 Tables is poised to become a stronger competitor to other data lake solutions. For now, it represents a thoughtful middle ground between traditional object storage and specialized analytics databases, providing an easier way to leverage Iceberg’s benefits without the operational overhead.
RDS Extended Support allows customers to continue receiving security updates for older database versions, but it comes at a significant hourly per-vCPU cost that increases over time.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.