Snowflake Table Types Explained
It’s Monday morning. You and a coworker are discussing our favorite topic - Snowflake infrastructure strategy. Which table should you use?
She asks you to try out a “Hybrid Table” to collect user events from your web application. You ask her whether event tables would work, and the both of you become sidetracked while you try to figure out what already exists in the warehouse. You run “show tables in <database_name>.<schema_name>” and compare the “kind” output to your fallback query “Select table_type, table_name from <database_name>.information_schema.tables”.
If you actually followed along with this story in your Snowflake environment, you’ll understand the confusion - both you and your coworker obviously stayed up to date with product announcements from 2022-2023, and now you’re seeing one of the table types that you mentioned, but not the other.
While those query outputs may be changing in the future, we’re currently caught in the transition period - where select accounts will actively be using some of these new features and associated terminology, while documentation is still catching up.
Current table types in Snowflake
This information was pulled on November 14th, 2023 from the Snowflake documentation.
- Base / Permanent: You’ll find both of these terms on the internet used to describe “standard” tables in Snowflake. If you don’t know what table type you’re using, this would be a good guess.
- Temporary: These are tables that only last for a given session before purging the data. One way you might use this is to temporarily stage data for cleaning prior to merging into a dataset that you’d like to backfill. Note that this differs from transient tables, which are also meant to contain data that isn’t meant to be permanently stored, but persists for longer than the given session.
- External: These are objects that are not stored in Snowflake but are queried via Snowflake. You can imagine this as any structured or semi-structured object in a cloud storage bucket.
- Other ones that I’m ignoring because they’re not tables: VIEW, or MATERIALIZED VIEW.
Note: “EVENT TABLE” is a current table type but will be covered in a dedicated section below because it will generally contain data generated from a different origin than the other table types mentioned above.
Snowflake’s new table types
In the following sections, we’re covering 4 table types announced between 2022-2023: Dynamic, Event, Hybrid, and Iceberg tables. The upcoming addition of these tables represent the ongoing evolution of types in Snowflake from duration-based and location-based (i.e. “EXTERNAL”) to use-case based.
Read on to answer “Which table type should I use?”
What are Dynamic Tables?
This upcoming table type allows users to materialize the results of a query, using both object references that are part of any query along with an input for desired lag. Functionally, this will create a new table that references and merges data from upstream objects that updates on a specified schedule.
Use cases for Dynamic Tables
Imagine you review a task list generated from several 3rd party sources intermittently throughout the day. To get the data to that state, you need to clean and merge data from those 3rd party sources in a recurring modeling process. It’s possible to just re-run a query every single time, but manual processes introduce potential for human error, making it better to automate when possible.
Note: Dynamic tables aren’t instantly updated, but do become eventually consistent with the raw source data, meaning that it’s still not the best for real-time use cases. Click here to learn how these compare to dbt.
What are Event Tables?
In this case, the “events” that this table type is capturing provide information on your Snowpark or stored procedure job runs. The types of records that you’ll find are generally split into severity levels (e.g WARN, ERROR) for use in debugging your function(s).
Use case for Event Tables
Anyone that’s looking at developing on top of Snowflake should consider setting up event tables for debugging, provided you familiarize yourself with the cost model first.
What are Hybrid Tables?
Without getting into details of how data is stored and retrieved, let’s consider that tables are generally optimized for one of two workloads: use of data (e.g. for analytics) and storage of data (e.g. for transactions). Hybrid tables promise to support both of these workloads with optimizations for both a high volume of transactions (e.g. inserts and updates) and single-row lookups.
Use case for Hybrid Tables
In the next few years, you’ll hear of hybrid tables mentioned alongside “Unistore”. The usual paradigm for companies has been to maintain two separate systems for transactions (i.e. OLTP) and analytics (i.e. OLAP), and many companies usually start off with databases optimized for transactions, such as MySQL or PostgreSQL. Over time, as the need to draw insights to drive decisions increases, teams will consider adoption another solution with optimized infrastructure for analytics. With hybrid tables, in theory, you no longer need to make this tradeoff and maintain 2 systems. Any company with a burning desire to get rid of their transactional system and are using Snowflake for analytics should consider using Hybrid Tables.
What are Iceberg Tables?
Despite the cold water related name, “Iceberg” as a term wasn’t originally created by Snowflake. It’s actually a reference to the table format created by Netflix and now Apache open-source-licensed. Snowflake has supported Iceberg file formats in the past, but only via External Tables.
Use case for Iceberg Tables
Some of the benefits of iceberg formats are its ability to handle large datasets and track schema evolution in its metadata. The decision to leverage this new table type will likely be a broader organizational discussion about how you’re using and storing data, with an alternative consideration in the form of Parquet.