Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Danielle Vansia
Contents
Views are virtual tables that represent the results of a query over one or more source tables. While views do not store data themselves, they present data in a readable manner that can simplify complex queries. One common use case for views is to control data security and data access. In the medical industry, for example, views along with other access control measures can limit who sees sensitive patient data, particularly in tables that multiple departments access. Views are also useful for reporting and aggregation—easily calculate monthly sales totals or other analytics on the fly.
In Snowflake, you can create a couple of view types, including:
Materialized views offer the key to striking the elusive query performance balance. However, data engineers and analysts need to weigh the cost implications of materialized views and determine whether the potential performance gains make up for some often-forgotten costs.
We’ve heard a few voices shed light on some of these costs, including Sam Lambert— CEO of PlanetScale—who shared his insights on these increased rates.
Snowflake charges 10x for materialized views. That's $30/hr. 😐 pic.twitter.com/gCtNLT7q6e— Sam Lambert (@isamlambert) August 3, 2023
Snowflake charges 10x for materialized views. That's $30/hr. 😐 pic.twitter.com/gCtNLT7q6e
This exploration will weigh the benefits and trade-offs of using materialized views in Snowflake, particularly from a cost perspective. These considerations can help you determine how to best optimize your data warehouse infrastructure while maintaining the ever-important cost-performance balance.
A regular—or non-materialized view—offers specific perspective on your data. Views have some of the following key features, as well as some limitations.
Data storage. Non-materialized views do not store data. They are queries that represent the underlying data from a source table.
Usage costs. You incur costs only when the view is queried.
Non-materialized views have a few limitations as well, including:
Source table changes. If a source table changes, those changes may not automatically extend to the view. In other words, changing a column on the source table could cause a view to become invalid.
Altering a view. You cannot use ALTER VIEW to change the definition of a view—you can change only its name, convert to or revert from a secure view, or adjust a comment on a view.
ALTER VIEW
Performance variations. Performance is dependent on the complexity of the underlying data. Every time you query the view, the underlying query logic is re-executed on the source table.
In this example, you manage a data warehouse for a company that makes educational video content. The warehouse contains a user_video_activity table that is updated in real time. This table contains a row for each instance of user activity when accessing videos. Video creators would like to access quick statistics showing how often their videos are watched. They would like to see these statistics in near real time. The user_video_activity table also contains a lot of columns that are likely irrelevant to the creator’s needs, such as user_location and user_device_type.
user_video_activity
user_location
user_device_type
To enhance data security and streamline access, you create a view for creators that selects the name of the video and shows total watch count, grouped by course and then video. Each instance of user video activity is an instance where that particular video was watched. This view excludes any personal user information from the base table, such as user_location.
CREATE OR REPLACE VIEW video_watch_summary AS SELECT course_id, video_id, video_title, COUNT(*) AS total_watch_count FROM user_video_activity GROUP BY course_id, video_id, video_title;
The creator can then query on this view each time they want to see video statistics, providing them with a streamlined version of the data they are looking for.
SELECT * FROM video_watch_summary; +-----------+-----------+----------------------+-------------------+ | COURSE_ID | VIDEO_ID | VIDEO_TITLE | TOTAL_WATCH_COUNT | +-----------+-----------+----------------------+-------------------+ | 101 | 1001 | Introduction | 3200 | | 101 | 1002 | Installing Python | 3300 | | 101 | 1003 | Python Loops | 3506 | | 102 | 2001 | Getting Started | 1067 | | 102 | 2002 | Advanced Tips | 1102 | +-----------+-----------+----------------------+-------------------+
A materialized view in Snowflake stores the underlying query results so that when the view is queried, the results are already pre-computed. When you have to consistently run a query, these time-savings can add up. Like non-materialized views, materialized views have a number of benefits, including:
Performance. Speeds up heavy aggregation or processing complex queries, especially in the context of large datasets.
Maintenance. Snowflake performs maintenance of the views using a background service that is “less error-prone” and “more efficient” than if you were to manually maintain the data.
Complex data. Materialized views can efficiently process complex data, such as semi-structured data.
There are some limitations to materialized views, including:
Storage. Materialized views use storage space in order to store the pre-computed results and are therefore prone to additional storage costs.
Number of tables queried. The materialized view can query on only one table.
Join limitations. Joins—including self-joins—are not supported.
One of the major factors data engineers and data analysts have to consider is the cost implications of materialized views. Materialized views store pre-computed data; therefore, they incur storage costs. The Snowflake documentation even cautions users:
…[B]efore creating any materialized views, you should consider whether the costs are offset by the savings from re-using these results frequently enough.
Another important factor to note is each time rows are inserted into the source table, background maintenance occurs, and those new rows are then inserted into the materialized view. The below table outlines Snowflake credits per compute-hour charged for maintenance processes.
Source: Snowflake Serverless Feature Credit Table. Pricing as of 11/2023.
With these considerations in mind, it’s important to think about the difference between both view types and the use cases for each. The table below outlines some of the differences previously discussed.
Materialized views vs. non-materialized views comparison
The most important factor when deciding whether a materialized view is “worth it” comes down to the structure of the underlying data: do the results of the view frequently change, and do you use those results very often?
Going back to the video tutorials example, let’s consider a scenario where a materialized view would make more sense. Because the user_video_activity table is loaded in real time, the data is constantly changing. Every time a user accesses a video, a new record is created, and the total_watch_count metric is recalculated. The table adds a row for each instance of user activity, and those instances of activity are counted in the watch statistics.
total_watch_count
Another table in the warehouse, video_ratings_comments, contains video ratings as well as user feedback and comments and is updated once a day. You would like to create a materialized view that shows the top 10 videos by average rating.
video_ratings_comments
CREATE MATERIALIZED VIEW top_rated_videos AS SELECT video_id, AVG(video_rating) AS average_rating FROM video_ratings_comments GROUP BY video_id ORDER BY average_rating DESC LIMIT 10;
Querying the materialized view would yield the following results.
SELECT * FROM top_rated_videos; +------------+-------------------+ | VIDEO_ID | AVERAGE_RATING | +------------+-------------------+ | 1001 | 4.8 | | 1005 | 4.7 | | 1100 | 4.7 | | 1003 | 4.6 | | 1007 | 4.5 | | 1002 | 4.4 | | 1008 | 4.3 | | 1004 | 4.2 | | 1009 | 4.1 | | 1006 | 4.0 | +------------+-------------------+
A materialized view is useful in this instance because it contains a small query result size (10 records). In addition, you avoid the issue of frequently changing data because the data is updated once a day, so the aggregations do not need to be constantly recalculated.
In the first example, you would not use a materialized view because the cost implications of a materialized view would likely outweigh its performance benefits—the data is constantly changing. In cases where materialized views come with cost considerations that outweigh their performance benefits, you can consider alternative strategies for optimizing both data warehouse performance and costs.
There are many available optimization strategies—with only a few referenced here.
In addition, consider using various cost optimization tools to gain insights into the categories that are driving costs, such as storage and compute. These tools can help you make informed decisions on cost-effective strategies and enhance the overall efficiency of your data warehouse.
The example below from the Vantage console illustrates the breakdown of accrued weekly costs. The data is grouped by categories, like Storage and Materialized Views. These analytics can empower you to make data-driven decisions and find the right balance between performance and cost-effectiveness in your Snowflake environment.
Snowflake costs by Category in Vantage
Views are valuable tools for streamlining data analysis as well as increasing data security. While materialized views can significantly boost query performance, they come with some potentially hefty cost implications. Therefore, it’s essential to carefully consider your specific use case. Striking the right balance between cost and performance is crucial. By leveraging Snowflake’s cost analysis features as well as other cost optimization tools, you can make informed decisions to ensure your data warehouse operates both efficiently and cost-effectively.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.
AWS is implementing a policy update that will no longer allow Reserved Instances and Savings Plans to be shared across end customers.