Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

Three ways to track schema drift in Snowflake

Schema drift is inevitable, but that doesn't mean you can ignore it. Because before you know it, you end up with missing or inconsistent data, which undermines the accuracy of your Snowflake queries and reports. And without knowing when or where these issues occur, you can't even begin to address them effectively. Here are three ways to track schema drift in Snowflake, so you can stay on top of any changes.

February 6, 2024

Data @ Metaplane

February 6, 2024
Three ways to track schema drift in Snowflake

To a data engineer, there are few things scarier than schema drift.

A horror movie marathon at midnight? Child’s play.

Finding a spider in your bed? No sweat.

Opening your fridge to find it completely empty? Okay, this might be a close second.

Still, the true chill down a data engineer's spine is reserved for the unpredictable and often unwelcome surprises lurking within the depths of their data warehouse, courtesy of schema drift.

And the scariest part? Schema drift is inevitable. 

What causes schema drift?

Changes in database schema over time—whether that’s additions, deletions, or modifications of columns, tables, or data types—lead to schema drift. These changes can be planned or unplanned, gradual or unexpected. 

These are some of the common causes of schema drift:

  • Data gets corrupted during migration
  • Data warehouse updates such as:
  • Adding features or fixing issues
  • Establishing new relationships between tables
  • Removing existing relationships between tables when they become unnecessary or irrelevant
  • Your organization switches to a different data warehouse
  • Your organization’s business requirements change and you need to:
  • Add new fields for new types of data to be collected and stored
  • Remove fields if certain types of data are no longer needed
  • Modify the data type of a field to reflect the nature of the data being stored correctly
  • You introduce new data sources
  • Technology standards change and/or new regulations are introduced

With all those causes, it’s no wonder that schema drift leads to so many data pipeline outages. You've got columns being added, data types changing on the fly, and when they go unnoticed, they swiftly erode data quality

This results in missing or inconsistent data that not only compromises the integrity and reliability of your queries and reports but also diminishes the overall trust in data across the organization. To mitigate this, you have to track any and all possible schema changes. Here are three ways to track schema drift in Snowflake.

Option 1: Generate and compare schema snapshots

Snowflake and other cloud warehouses’ support for native Schema Change Tracking is still in its infancy, but that doesn’t preclude users from creating their own history of changes. 

One way that you can do this is through periodic, recurring snapshots of your schemas. Here’s a simple sample query that you could run to snapshot this for table(s) within a given database:

Use DATABASE <your_database_name>
With snapshot_t0 as
From information_schema.columns
Order by 1,2,3,4

After you create that snapshot, you’ll want to compare for deltas. Imagining we’re staying with just SQL, you’ll then write a few queries to check for common schema changes. Below are sample queries with snapshot_t0 and snapshot_t1 being placeholder names for your snapshot tables.

 --- Make sure to either name your snapshot tables with different column names or specify them here. 
With schema_comparison as
Select *
From snapshot_t0
Join snapshot_t1 on snapshot_t0.table_schema = snapshot_t1.table_schema

--- Finding new, dropped, or renamed columns
From schema_comparison
Where column_names_0 != column_names_1

--- Finding new, dropped, or renamed tables
From schema_comparison
Where table_names_0 != table_names_1

--- Finding new, dropped, or renamed data types
From schema_comparison
 column_names_0 = <some_name>
 & column_names_1 = <some_name>
 & data_type_0 != data_type_1

There are a few gaps in this approach, with significant outliers being:

  • The need to specify which database(s) and schema(s) you’d like to run this for
  • Orchestration to schedule both snapshots and deltas
  • An understanding of whether a schema change was significant

Option 2: Snowflake Community Python script

Inspired by the Flyway database migration tool, the Snowflake Community python script (schemachange) is a simple Python-based tool to manage all of your Snowflake objects. You can read all about the open-source script here, but here’s an overview:

  • You or someone on your team should have the ability to run Python. Note that you’ll need the ‎Snowflake Python driver installed wherever you’ll be running this script. You’ll also want to familiarize yourselves with Jinja templating if you want to simplify inserting variables as you find yourself with new tables.
  • You’ll need to create a table in Snowflake to write changes to, with the default location being: METADATA.SCHEMACHANGE.CHANGE_HISTORY
  • You’ll need to specify your Snowflake connection parameters in schemachange-config.yml
  • You’ll need to write queries that output your desired schemas to compare, following their naming conventions, structured in this way.

This script helps you manually track all of your schema changes. But you’ll also need to explicitly define what tables, schemas, and databases you're tracking—and be able to run Python and be familiar with the CLI to do so. 

Option 3: Leverage Snowflake’s information_schema

Similar to Option 2, you can leverage Snowflake’s information_schema to get a full view of all schema changes. This solution can be helpful for ad-hoc checks when triaging a data quality incident. But keep in mind that, by default, Snowflake only retains this information for 7 days. 

An example query for the full list of schema changes would look like this:

FROM table(information_schema.query_history())
---specify schema change query(s) here
 query_text LIKE ‘ALTER TABLE%’
---optional specification for database_name, schema_name, user or compute warehouse here
---alternatively, you can query the information_schema.query_history_by* tables
 Database_name = ‘<your_database-name>’
 schema_name = ‘<your_schema_name>’
 role_name = ‘<your_role_name>‘
 user_name = ‘<your_user_name>’
 warehouse_name = ‘<your_warehouse_name>’

This is a great option for triaging incidents that occurred within the past week. You can optionally save the results for a full history of schema changes to reference, but it can quickly become compute-heavy for Snowflake instances with a high volume of queries. 

So, if none of these options quite fit the bill and you’re looking for an automated way to track your schema changes, we have another option.

Metaplane automatically monitors your data warehouse for schema changes (e.g. column additions, data type changes, table drops, etc). There’s no need to define what tables, schemas, and databases you're tracking.

The best part? With Metaplane, you can filter out which schema changes you want to monitor and receive notifications about. That way, you can only receive the alerts that are critical to your data's integrity and operational continuity—not just a barrage of noise.

Whether you opt for a manual or an automated approach, the bottom line is this: start tracking your schema changes if you aren’t already. So, choose a method and stick with it! That’s the only way to prevent any more schema drift-related pain, increase the reliability of your data systems, and boost trust across the organization in the process. 

Want to get automatically alerted on schema changes? Talk to us or start a free trial today.

We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

Your email
Ensure trust in data

Start monitoring your data in minutes.

Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.