Prioritizing Data Observability: Why Now?

Five important reasons for why you should consider making data quality a priority now, instead of waiting until it becomes an issue.

Kevin Hu, PhD

and

May 15, 2023

Kevin Hu, PhD

Co-founder / Data and ML

May 15, 2023

Prioritizing Data Observability: Why Now?

After speaking with hundreds of data leaders ranging from high-leverage solo teams at startups to decentralized data teams with 50+ members at Fortune 500 companies, one thing is clear: data is becoming ubiquitous regardless of the size, industry, or make up of companies.

Every company we’ve spoken to uses data in various ways, helping drive critical business decisions in product, marketing, sales, and finance. As data is operationalized, data leaders have shared with us that data quality is one of their top priorities, and that better observability is a sustainable solution.

One of the most common questions we receive is: when is the right time to become proactive about data quality by building a tool in-house or adopting a data observability platform?

From our perspective, there are five important reasons for why you should consider making data quality a priority now, instead of waiting until it becomes an issue.

Webinar about whether one should wait to evaluate data observability tool.

1. Trust is easy to lose and hard to regain

As data people, our goal is to enable other teams to ask questions and make informed decisions. Two necessary ingredients are data and literacy. But the other necessary ingredient is trust. Without trust in data, stakeholders rely on their own splintered datasets and siloed data usage or second-guess your data ("hey, can I use this report?"). Skepticism is good; lack of trust is not.

The problem we've experienced is that trust is irreversible. Trust takes seconds to lose and months to build. Not only is trust lost with individuals who made incorrect decisions using your team’s data, but distrust in data trickles throughout organizations and results in less data-driven or siloed decision making.

Lastly, as modern data teams increasingly adopt data mesh architectures and self-serve analytics, preserving trust is even more critical. New, autonomous teams need to be able to use and trust data to make important decisions in their own domains. Without insight into the quality of data over time, your organization could make the wrong decisions based on incorrect data across the mesh.

2. Data loss is a problem that never goes away

Data loss is the bane of our existence. Not only does losing data deteriorate trust, but accounting for lost data down the line creates data debt and can be painful.

When modeling data, if data is missing then it generally complicates joins and transformations, injecting conditionals and logic into what should otherwise be a simple SQL query. When data is used in dashboards, gaps should be annotated and explained to stakeholders, or you’re at risk of making incorrect decisions. But such annotations are generally not easy to make, even in modern BI tools. Missing data often needs to be discarded or imputed, especially when used as inputs into an analytical model.

Worst of all is that missing data never goes away, and backfilling what can be backfilled can be challenging. Every time you onboard a new employee, create or update models, or change dashboards, missing data needs to be considered. While s*** always happens, proactively monitoring data quality can decrease the frequency of data loss, and early detection can decrease the severity when it happens.

3. Historical data is a compounding asset

Data is becoming a product. One of the first things that software product teams do is install analytics (ala Segment, Amplitude, Google Analytics). Michael Seibel of Y Combinator recommends analytics from Day One: "You can't be sophisticated about building your product... it's a prerequisite".

Analytics products provide a historical record of usage to answer questions like: how do users use our product? How does the present moment compare to the past? What is the impact of this feature?

Analogously, we see data observability tools like historical baselines (e.g. row counts over time), anomaly detection, and incident reporting, to name a few, unlocking new powers for data teams and help prove how valuable their work is for the larger organization.

The good news is that, like data itself, once you begin collecting metadata, it becomes a compounding asset. Each day imparts a new data point for richer historical comparisons and increased statistical power. That leads to a better understanding of and trust in your data.

4. Move fast, without breaking things

In the pre-observability world, it was common for engineering teams to deploy infrastructure with simple heartbeat checks. This all changed in the early 2010s. Just as product teams install analytics from day one, one of the first products used by modern engineering teams is an observability product like Datadog, used in tandem with infrastructure platforms like AWS.

These observability products granted detailed visibility into all aspects of infrastructure, so teams could proactively detect system degradation and have the metadata (“traces”) needed to debug. This cut down on time along four dimensions: time to identify an issue, time to diagnose the root cause, time to fix the issue, and finally the time to verify the fix. All of these steps roll up into time-to-resolution.

Teams don’t want to just decrease time to resolve individual issues; they also want to reduce the frequency and severity of issues. With more observability, the root causes of issues can be identified and fixed, helping engineers spend less time debugging and fixing issues and more time on the things they actually want to work on.

Zooming out a bit, the frequency and severity of issues also decrease with more observability, resulting in not only more time but also fewer distractions. In contrast, many of us may be familiar with teams that significant amounts of time are spent being reactive to issues, and only after the frequency of issues becomes smothering with `p0`s do some teams prioritize data quality.

5. Prioritize what matters

The last benefit is that you have the maximum amount of information to inform the most important decision at the end of the day: how will you spend your time?

How do you make decisions about what to work on when you don’t know what the baseline or bottlenecks are? For example, without knowing the runtimes of your dbt DAGs, which models need to be optimized? Without knowing which tables experience freshness issues, how do you know which transformations to prioritize? Without knowledge of the most important dashboards, how do you know which data quality issue to prioritize first?

Put another way: Data teams can help other teams prioritize their work. But what about our own work? Spending time putting out the most recent big fire is not only unenjoyable, but also unproductive and unsustainable.

Five Reasons to Prioritize Data Observability — Reasons to prioritize data observability range from the immediate cost of data to the abstract, but equally important, need to maintain trust in data.

Why wait?

Phew! There's five reasons why you shouldn't wait to prioritize data observability. Two are focused on what you lose: both trust and data are easy to lose and hard to regain. Three are focused on what you gain: historical metadata is a compounding asset that helps you move faster and prioritize work.

But let’s play devil’s advocate. There are a few reasons companies have shared on why they’re waiting: On the flip side, there’s several lukewarm justifications for waiting, such as:

Waiting for higher data adoption. Rebuttal: lack of trust is a common reason for lack of adoption
Limited budget. Rebuttal: there are both open source and extremely affordable commercial options
Limited bandwidth. Rebuttal: often times lack of observability leads to limited bandwidth

We’re biased, but none of these reasons are strong enough to justify a modern team waiting. It’s akin to a software engineer waiting for an application to go down before installing Datadog. That would be ridiculous.

But there is one compelling reason why you should wait: if you don’t have any piece of your data stack in place. With the advent of the modern data stack, many teams are in the process of piecing together or migrating towards an ELT-warehouse-transformation chain with reverse ETL and BI at the end.

We typically recommend waiting until one piece of the migration is complete, otherwise alerts will be overwhelming and the historical metadata you collect will quickly turn stale.

What now?

We like to boil down the importance of data observability into one question: are any teams making business or product decisions based on the data your team ingests, stores, transforms, or visualizes? If the answer is “yes”, then adding observability is a requirement for any modern data team.

We’ve been there - sometimes teams don’t have enough bandwidth and wait until they are inundated with p0 issues before they prioritize data quality. We, as data engineers, need to get ahead of data debt and create reliable and resilient systems so that we can empower our teammates to make accurate and business-improving decisions every day.

If you aren’t convinced of the importance of data observability, we’d love to hear your opinions and take you out for a virtual coffee. If you are, now is a better time than ever to explore any of the many open source or commercial tools in the space.

For high-leverage teams that want a fully managed solution with end-to-end lineage that doesn’t cost more than your warehouse, we’d love to give you a demo of Metaplane.