Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

What is Data Freshness? Definition, Examples, and Best Practices

and
May 28, 2023

Co-founder / Data and ML

May 28, 2023
What is Data Freshness? Definition, Examples, and Best Practices

If you care about whether your business succeeds or fails, you should care about data freshness. Fresh data is important because it has a huge impact on your bottom line. Unfortunately, that impact often goes undetected—until it’s too late.

Say your business uses data for operational purposes, and your data is stale, you could inadvertently send a discount code to a cohort of customers who already purchased your solution, inviting them to demand the same deal terms, costing you thousands of dollars.

If your business uses data for decision-making purposes, on the other hand, and your data is stale, you could underreport your return on ad spend, causing you to withdraw an investment that is actually paying dividends in reality.

Now that you know why data freshness matters, let’s dive into exactly what it means. In this blog post, you’ll find a definition, examples, and four methods for measuring data freshness.

What is data freshness?

Data freshness, sometimes called data up-to-dateness, is one of ten dimensions of data quality. Data is considered fresh if it describes the real world right now. This data quality dimension is closely related to the timeliness of the data but is compared against the present moment, rather than the time of a task.

What are some examples of stale data?

Imagine that you’re part of the data team, specifically part of the data engineering team, which includes creating data pipelines for downstream stakeholders. Just as an example, we’ll be using a common use case of pulling data from the Google Ads and Google Analytics APIs into your data warehouse as part of your core data sources being used to define marketing attribution. While some use cases may require constant data processing to achieve real-time decisions; if we’re looking for marketing attribution, a common minimum acceptable cadence for data refreshes can be safely set at daily updates - any refresh cadence beyond that would be considered "stale" data.

Data freshness depends on data product use case


The above is one way that the definition of “stale data” can change dependent on your internal data management agreements with other stakeholders. Continuing with the example, if your team shifts from ad-hoc SQL queries to adopting dbt to chain together schema and model dependencies in the pursuit of speed, your once acceptable daily updates slip quickly into being considered as “stale data”.

How do you measure data freshness?

To test any data quality dimension, you must measure, track, and assess a relevant data quality metric. In the case of data freshness, you can measure the difference between latest timestamps against the present moment, the difference between a destination and a source system, verification against an expected rate of change, or corroboration against other pieces of data.

{{inline-b}}

How to ensure data freshness

One way to ensure data freshness is through anomaly detection, sometimes called outlier analysis, which helps you to identify unexpected values or events in a data set. Data Observability tools include anomaly detection as part of the core functionality, and can find not only data freshness, but also other dimensions such as completeness or consistency, for you to scale data quality measures across your data warehouse.

Using the example of a stale number of products sold, anomaly detection software would notify you instantly if the frequency at which the table was updated was outside of the normal range. The software knows it’s an outlier because its machine learning model learns from your historical metadata.

Here’s how anomaly detection helps Andrew Mackenzie, Business Intelligence Architect at Appcues, perform his role:

“The important thing is that when things break, I know immediately—and I can usually fix them before any of my stakeholders find out.”

In other words, you can say goodbye to the dreaded WTF message from your stakeholders. In that way, automated, real-time anomaly detection is like a friend who has always got your back.

To take anomaly detection for a spin and put an end to poor data quality, sign up for Metaplane’s free-forever plan or test our most advanced features with a 14-day free trial. Implementation takes under 30 minutes.

We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

Your email
Ensure trust in data

Start monitoring your data in minutes.

Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.