One of the most exciting 2024 re:Invent announcements was Amazon S3 Tables. Designed to simplify analytics on S3, S3 Tables provide a fully managed way to store structured data using Apache Iceberg. This announcement has even led to speculation that S3 is becoming a fully managed data lakehouse solution that will put Snowflake and Databricks out of business, however, the reality is more nuanced.

What Are S3 Tables

S3 Tables introduces a new type of S3 bucket that stores structured data in the Apache Parquet format and manages tables using the Apache Iceberg format. They were introduced to improve performance for analytics workloads, as querying from S3 Standard storage sometimes causes performance bottlenecks in the form of transaction limits or unexpected cost increases.

A little backstory—for analytics use cases relying on S3 Standard storage, users have traditionally set up table formatting manually using Hive. However, Hive lacks certain features and optimizations, making it a less-than-ideal choice. As a result, more S3 Standard users started adopting Iceberg, which was designed to solve some of Hive’s problems with advantages including improved query performance, time travel, schema evolution, ACID transactions, and lower costs.

Still, using Iceberg with S3 Standard requires you to set up and manage tables, which corresponds to engineering effort and associated costs. S3 Tables manages that work for you, which is why some are calling it just a managed Iceberg Service.

S3 Tables come with benefits such as:

  • Continual table maintenance for query and cost optimization, including compaction, snapshot management, and unreferenced file removal.
  • Easy integration with AWS analytic services, such as Amazon Data Firehose, Athena, Redshift, EMR, and QuickSight. It can also be used with external query engines that support Iceberg, like Apache Spark. With the availability of multiple clients to read and write to the tables.
  • Iceberg benefits like row-level transactions, queryable snapshots, and schema evolution.
  • The same benefits of S3, for instance, security, durability, availability, S3 API Support, and more.
  • Additional security in the form of table-level permissions, set via identity- or resource-based policies.
  • Higher TPS and better query throughput than self-managed tables in general purpose S3 buckets.

S3 Tables Cost

Pricing for S3 Tables is all and all not bad. Charges apply for storage, including per object monitored, requests, and optional maintenance fees for compaction.

We’ll review and compare them to S3 Standard for additional perspective. Prices are for the US East N Virginia region.

Storage is billed on a tiered plan, with discounted rates for more storage. For TB stored across all tiers, S3 Tables is about 15% more expensive than S3 Standard for storage. Monitoring costs are billed at the same rates as S3 Intelligent-Tiering’s monitoring costs.

S3 Storage Pricing Dimension S3 Tables S3 Standard
Monitoring, All Storage / Month $0.025 per 1,000 objects N/A
First 50 TB / Month $0.0265 per GB $0.023 per GB
Next 450 TB / Month $0.0253 per GB $0.022 per GB
Over 500 TB / Month $0.0242 per GB $0.021 per GB

Charges for API requests apply as well, but are charged at the same rates as other S3 buckets.

S3 Requests Pricing Dimension S3 Tables S3 Standard
PUT, POST, LIST requests $0.005 $0.005
GET, and all other requests $0.0004 $0.0004

Finally, charges apply for compaction at the following rates. This is where spectators have begun to worry about cost. However, as others have pointed out, although AWS does not share their compaction algorithm, we can make an educated assumption that AWS does not compact an object more.

S3 Maintenance Pricing Dimension S3 Tables S3 Standard
Compaction - Objects $0.004 per 1,000 objects processed N/A
Compaction - Data Processed $0.05 per GB processed N/A

S3 Tables Pricing Scenario

An S3 Table has 10 TB of data stored. The average size of an object is 1 GB. There are 1 million PUT requests and 10 million GET requests that month. The approximate costs are as follows:

10 TB data = 10,000 GB data

10 TB data / 1 GB average object size = 10,000 objects

Storage Costs: 10,000 GB data x $0.0265 per GB = $265.00

Monitoring Costs: 10,000 objects x ($0.025 / 1,000 objects) = $0.25

API Request Costs: 1,000,000 PUT requests x ($0.005 / 1,000 requests) + 1,000,000 GET requests x ($0.004 / 1,000 requests) = $9

Assuming objects are only compressed once estimate 10% of objects a month are compressed.

Compaction Costs Objects: (10,000 objects x 10%) x ($0.004 / 1,000 objects) = $0.004

Compaction Costs Data Processed: (10,000 GB x 10%) x $0.05 per GB = $50

Total: $265.00 storage + $0.25 monitoring + $9 API + $0.004 compaction objects + $50 compaction data processed = $324.25

In S3 Standard the cost would be $239, making S3 Tables 36% more expensive in this scenario. It is up to you if that cost is justified for not having to maintain the tables yourself. See the next section for information on when to use or not use S3 Tables.

When to Use and Not Use S3 Tables

S3 Tables are recommended by AWS for analytics workloads that are optimized for tabular data, for example daily purchase transactions, streaming sensor data, ad impressions, real-time streaming, change data capture (CDC), and log analysis.

For cases where you’re querying from S3 Standard storage using a different data format like Hive and are running into slowness or high costs S3 Tables may be an easier solution than managing and maintaining it yourself. However, if you’re already using Iceberg with S3 Standard and have an efficient enough compacting method, it may not be worth the engineering effort or additional cost to make the switch.

One con is there may be some degree of vendor lock in with this implementation, as S3 Tables are not fully open in the way that traditional Lake House architectures are since they are not open storage and require AWS-specific APIs and third-party connectors for external integration.

Another thing to note is as of now, S3 Tables are not yet available in all regions, and integration with AWS Glue is still in preview, which may limit its usability.

With all that in mind, it may be a disruptor to Snowflake and Databricks, but is nowhere near putting them out of business.

Conclusion

As AWS continues to expand regional availability and integration with services like AWS Glue moves beyond preview, S3 Tables is poised to become a stronger competitor to other data lake solutions. For now, it represents a thoughtful middle ground between traditional object storage and specialized analytics databases, providing an easier way to leverage Iceberg’s benefits without the operational overhead.