LIVE EVENT

What exactly is data observability?

During this live event, Kevin Hu and David Jayatillake cover the hard questions people ask most, like:

What exactly is meant by the term data observability?
What isn't data observability?
How is data observability different from...
a. software or infrastructure observability?
b. data quality?
c. data monitoring?
Why do we need yet another term?
How can you explain data observability to different audiences?

‍

TL;DR Summary

Data observability is the degree of visibility into the state of data at any given time.
Data observability is different from data quality, monitoring, and software/infrastructure observability.
The four pillars of data observability are metrics, metadata, lineage, and logs.
Data observability can address various use cases, including data quality, spend monitoring, usage analytics, and preventing data issues.
It is important for different audiences to understand the value of data observability and how it can address specific problems within a data system.
Data observability is essential due to the fickleness of data, its heavy nature, and the time it takes to remediate issues.
Usage of data has been underrated, but it is becoming increasingly important to understand which dashboards are in use and to optimize views and tables.

‍

Event Transcript

Introduction [00:00:00]

Kevin: Hi everyone! Welcome to our second live data chat and first of the year. Last time, David and I had a lot of fun. This is really an extension of our one-on-ones, right David? It literally goes from our one-on-one into a webinar.

David: Happy New Year everyone. Yeah, absolutely. This is definitely a reflection of what Kevin and I just talk about anyway.

Kevin: I swear we talk about company stuff too, but usually we talk about data stuff and the recent posts that David has.

But this time around, we're going to turn the tables a little bit. David is going to ask me questions about what exactly is data observability?

The motivation is: this is a question that we ask ourselves a lot. And we'll not leave this call until everyone has an answer to that question. And once you do, we'll end the call. So I'll pass it over to David.

What exactly is meant by the term data observability? [00:00:58]

Summary: Data observability is the degree of visibility you have into the state of your data at any given time, and it's important to understand the state of your data relative to the goal that you wanted to accomplish. Good data observability means you have the visibility to get ahead of data issues beforehand, while bad data observability means the first time you hear about a data issue is when a stakeholder bumps into it.

David: There's very few people out there who can answer this question as well as Kevin. What exactly is meant by the term data observability?

Data observability is the degree of visibility you have into the state of your data at any given time.

I use that definition very intentionally because there are three key pieces there: data, state, and visibility.

Companies collect data, individuals collect data in order to represent the real world, where data is some approximation of what is happening and what has happened, and in some cases what will happen.

However, data is not a full representation of the entire world. It's not even a full representation of the entire world that your business cares about. And it's only an approximation at any given point in time.

Things in the world happen continuously, whereas data sometimes might be batch, sometimes it might be streaming, but even streaming data doesn't collect it down to the quantum time.

It's important then to discuss: what is the state of your data? Because ultimately data is meant to serve a purpose within businesses. Businesses at the highest level: they make money, they don't lose money, and they try and mitigate risk.

There's different use cases you can encourage and support in order to drive those three levers and different technologies that you can put in place and teams to put in place in order to support those use cases, which we can get into another time.

But it's important to understand: what is the state of your data relative to the goal that you wanted to accomplish. For example, is it fresh to the hour? Does it abide by the definitions that your company cares about and the mental models of the world and everyone's brain. That's what we mean by the state.

David: So this corresponds really closely to like SLAs and those kind of governance terms.

Kevin: Exactly. Because everyone has different expectations for the state of data, and as a result, it's important to get some accountability, but even before accountability, some visibility into that state.

Your data is in a certain state regardless of what you know about it. But the visibility that you have as a data leader and that the company has into the data is a little bit decoupled.

If your data's fresh to the hour, do you know? Or is it fresh to 30 minutes or is it two days delayed? Are these dashboards being used? If so, by who and when? Or if they're not being used, which ones aren't being used?

That's why we define data observability as the degree of visibility you have into the state of your data at any given time. And to push that a little bit further, there is a couple of different ways, a couple of different perspectives you can take on.

One is: how many questions can you answer about your data? If you're trapped in a room with someone interrogating you about your data, how many questions can you answer about it? Are you modeling this concept that a business team cares about? When was the last time this data was refreshed? Does it meet the quality that you expect? How is one table connected to other tables via lineage?

Are the people who are using this data using it in the ways that you expect? This is related to the information content idea. In a way, the information content of anything is a number of coin flips, the number of yes or no binary questions you need to answer.

If you have a box full of a gas at one temperature, you don't need that many questions, but if something is highly ordered, you need a lot of questions in order to answer the state of this system, in other words, it's high information content.

A third perspective is: what does good data observability look like versus what does bad data observability look like?

Some symptoms of bad might be: the first time you hear about a data issue is when a stakeholder bumps into it, or you find out that a job is delayed because a dashboard is not refreshing. In contrast, good data observability means you have the visibility to get ahead of these questions beforehand.

David: So these are good outcomes. Is this very related to what you can do with your knowledge from the data observability? Yes. So whether it's good or bad,

Kevin: It's definitely related to what you can do. The simplest case is the state is not what you expect. It's not what your stakeholders expect, and I had to fix it. It's incident management in that way.

But there's plenty of other things that you can do with data observability, like: how is my data being used? Can I deprecate some tables to reduce my warehouse spend? Can I deprecate some dashboards so we don't have, 10,000 dashboards to maintain? Instead only 9,000 dashboards? There's a whole slew of questions that boil up into helping your team ensure trust and make the best use of your time.

What isn’t data observability? [00:06:37]

David: What isn't data observability?

Kevin: It's a tough question to answer because at least us vendors say that data observability is everything under the sun. It will catch data quality issues and it will wash your dishes for you.

But there's three useful and key terms in the world that are typically conflated with data observability that I would want to pull apart.

One is data quality. Data observability is different from data quality, it is different from data monitoring, and it is different from software or infrastructure/DevOps observability like Datadog and Splunk and SignalFx.

Data quality vs. data observability [00:07:22]

David: That's really interesting that you've chosen those kind of three, four areas to, to like contrast. Let's delve into them. How would you say data observability is different from data quality?

Kevin: Data quality is a problem that real people face. Data observability is not a problem.

As a head of data, and there's many heads of data in the world, they don't wake up in the morning thinking: " I have a data observability problem." They wake up, unfortunately, thinking "I have a data quality problem and I have to fix it." My VP of sales slacked me at 4:00 AM yesterday because they have a board meeting this morning, and Looker isn't refreshing. It's not showing what they know is the ground truth when it comes to sales performance.

But although data quality is a problem and data observability is not, Data observability is a technology that can be used to address many different use cases.

So when we return to data observability being the degree of visibility you have into the state of your data at a given time, data quality is one problem that can be addressed when you're sitting on top of this mountain of metadata and you have Intelligence or some analysis on top of that.

It could be anomaly detection, the kind that we support at Metaplane. It could be showing it in a dashboard, it could be putting it in a table. Having this metadata can help you in the case of data quality on every step of your incident management journey, from identification / detection to resolution and remediation.

But there are ways to solve data quality issues. Without data observability, you could throw lots of people at the problem, and that's what companies have been doing for a very long time, and it does work. It's just not super efficient because you're trawling through your data stack, typically trying to find this metadata to begin with.

To be fair though, metadata is not the entire story. There's a lot of knowledge within people's heads. About lineage and usage and logs and different metrics that combine together with metadata that's collected from data observability that help with incident management.

Data monitoring vs. data observability [00:09:58]

Summary: Data observability is different from data quality in that data quality is a problem faced by real people, while data observability is a technology that can be used to address many different use cases.

‍

David: The second term you mentioned was data monitoring. And I feel that's even a more difficult distinction to make, like data monitoring versus data observability.

Kevin: This is less clear cut. Data monitoring and data observability solve the same problem through the same means. They want to help you gain visibility by means of collecting, centralizing, and analyzing metadata.

In that way, they can be used interchangeably. However, you have to be aware that there are different connotations where in data monitoring, and David, you've written about this when you're monitoring. you're typically very explicit about what you're trying to monitor. Like I want to monitor this specific business metric.

I want to monitor the freshness of this specific table and be alerted if something goes wrong, for example. And in contrast, data observability takes a more greedy approach. We want it all right. As much metadata as you can give us, we will take it and ingest it. And this is important because you can't foresee every issue that happens.

Yes, it's important to monitor your most specific potential root causes for data quality issues, your most expensive queries, the usage of your most important tables. But there are so many unknown unknowns in the data. It's hard to tell in advance. Once it happens you're like, "ah, shoot, I should have been monitoring that."

That is what data observability helps with, and going one level deeper, even though you're monitoring a specific thing for its state, if something goes wrong, oftentimes, it's not that end table, that gold table, as you would say David, that went wrong. That data comes from somewhere.

Ultimately, some computer put it in or some human put it in, or perhaps an AI put it in. And if you're not monitoring those root causes, you're gonna have a more challenging time debugging. It's not impossible, but it's useful to have those breadcrumbs.

Software observability vs. data observability [00:12:18]

Summary: Software observability tools such as Datadog and New Relic provide visibility into the state of software and infrastructure, while data observability focuses on ensuring the accuracy and quality of data, even when all other metrics and tests appear to be passing. The two are complementary and necessary for a complete picture of system health.

David: Had an interesting question come in from Renee Cooper. She says "so similar to how might one set up specific alerts in Datadog or New Relic versus just overall observability via these tools?" And she's in the context of comparing software monitoring to observability.

Kevin: Exactly. So if you have a database, let's say on AWS, you should probably set up alerts on the CPU utilization of that instance.

But that might not necessarily be the thing that goes wrong when prod goes down. It might be something much more nuanced, like a caching issue or free storage space. You should probably be monitoring that, but that is a good contrast.

Are you setting up a specific alert on a key database health metric versus if you're using a tool like CloudWatch or Datadog or New Relic and they're just scooping up everything.

It's so useful when prod goes down. I wanna know more than that the CPU utilization went to a hundred percent. I want to see everything. All right, so that is a great comparison.

David: Moving on to the third term that you mentioned, which was software or infra observability, and that kind of links nicely to Renee's point about Datadog and New Relic. How is data observability different [from] those things?

Kevin: Let's ground ourselves first and what do the software / infrastructure / DevOps observability tools: what do they do? Renee, you mentioned Datadog and New Relic. There's many more and many more to come as well, and I'll describe them as helping you gain visibility into the state of your software at any given time.

Companies are developing software, putting that software on boxes in the cloud, and software and engineering teams want to know: what's the state of those boxes and the connections between them before a customer is the one that either finds the issue or before an end user is your alerting suite.

If we go back in time before these cloud observability tools into 2010, I very clearly remember shipping a ruby on rail server, putting on an EC2 box, putting on a heartbeat check, putting on Pingdom, and then you call a day. People hadn't heard of observability as a concept.

Slowly, we're at this point, one decade in the future where it's almost blasphemous to not have a tool like Datadog or New Relic. Once prod goes down once, then put agents all over your Kubernetes cluster.

The key difference is the distinction between software and data. your software and your infrastructure can all be green, but your data can still be incorrect. Now you know how many of us in this chat , you and I, David, we've been in that situation. Snowflake good. dbt jobs good. Looker, Fivetran, it's all running, but the data is just incorrect.

So the two are orthogonal in that way, and only together do you have a complete picture of "are our systems green?" and of course the two are related, right? If your Snowflake goes down, your data quality is compromised, not that snowflake goes down very often.

Going one level deeper is the differences between software and data. What does it take to describe the state of software? You have metric, traces, and logs. Those are the three pillars that are being, pushed all the time. But it's a little bit different in our world. Where instead of traces, we have lineage. Data comes from somewhere, and we want to trace it up to the ultimate origin point.

The ingredients that go into describing your state are different. And of course the workflows and the teams are different, which is why our category can exist at all. If it was easy to use, for example, Datadog, for observing your data, teams can be and should be using that enormous company's product as opposed to companies in an emerging category.

David: I've totally explored the possibility of everything being green and if you even extend that to things like dbt tests, data contracts, they can all be passing, but your data can still be wrong. And that's where data observability comes in.

The pillars of data observability [00:17:11]

Summary: The pillars of data observability are metrics, metadata, lineage, and logs. David and Kevin also mention the various use cases for data observability, such as data quality monitoring, spend monitoring, and helping prevent data issues.

‍

David: Erik Edelman, a good friend of ours has popped in with a question like "in my mind, monitoring for quality issues is just a part (one major pillar) of what I consider to be data observability. You may have already touched on this a bit, but what other pillars of observability do you think there? We know lineage is helpful even when there's no issue to report on, but how do you talk about or refer to this or other parts of observability?"

Kevin: This is a great question. I would approach it from: what are the pillars that go into data observability? And also what are the use cases that are supported by data observability?

Like Eric mentioned, one of the ingredients is lineage. The way we think about it is that there are four in total. There's metrics, metadata, lineage, and logs. So to separate those out a little bit, when we're describing, any system we want to describe its internal properties, it's external properties, its internal relationships, and its external relationships.

Similar to how if you have a bottle of water, there are internal properties like its temperature that don't depend on how much water you have. Conversely, there are characteristics of this water, like the amount of water in my bottle, that don't impact its internal characteristics.

The same is true for data, where you have metrics that describe, for example, the distribution of values in this revenue column. You have metrics that describe the number of enums in this column, and that doesn't depend on how much data you have.

To summarize that, we call them metrics you have externally observable metadata, like how many rows are there, how fresh is the data, what's the schema of this table? So those two together describe the properties.

Then you also have the relationships where, for example, you have lineage. How does this data that I'm looking at come from the source? And how does it trickle all the way down? These are the internal relationships within data as well as any constraints you might have: JOIN relationships, defining primary keys and so on.

And you have the external relationships: who's consuming this data? Which queries are hitting this table? When? And by who?

These four are taken together, metrics, metadata, lineage, and logs, are what we consider to be the four pillars of data observability. It's a little bit different from what Eric was referring to, where there's different use cases that data observability can address, which is totally true.

Data quality [is] one piece of it. There's spend monitoring, for example, when you're pulling in query logs, is your spend increasing or is an anomalous for P99 queries? There's usage analytics: how often are different tables and dashboards being used? There's helping prevent data issues to begin with, and instead of detecting and remediating them. There's helping data teams get an internal map. Internal awareness of how the data is being used... like Eric mentioned, lineage is a great way to do this.

The list goes on, and when you add these all together, the goal is the same. Help your team gain leverage with as much information as possible.

David: Totally agree with those pillars, but the lineage pillar is really interesting because lineage is often defined by code, like whether that's code in your orchestrator or code in dbt.

There's that almost like vertical component, which is: what is it? Where is the data going? But then there's also like this horizontal component of the state of that lineage, and how has that lineage changed, because of the code change.

And I think observability tools will be starting to start observing the state of the lineage that's been changed by code changes.

Kevin: A hundred percent. We should have mentioned that earlier. Everything that you're monitoring, it's over time. Data is interesting because it changes. If data didn't change it, it is not particularly interesting. So as a result, all the characteristics of the data changes. I love the way you put that.

Why do we need another term? [00:21:23]

Summary: The term "data observability” emerged due to advancements in technology and the need to organize and communicate effectively about data quality issues. While the term may be fuzzy and have different definitions, it has practical applications for data leaders and can summarize various connected terms into a high-information content and high-bandwidth term.

‍

David: Why do we need yet another term, data observability?

Kevin: There's three takes here. I'll start with the most pessimistic and get to the most optimistic.

The pessimistic one is that, with low interest rates, you had a bunch of companies raising a bunch of money during the frothy VC era, and now they're trying to create a new category when it shouldn't have existed, because there's lots of returns to creating a category. There's a component to that, I'm not gonna lie. I'm not gonna throw the first stone.

But the two other perspectives are: one, technological, where data quality issues... they're not new. They're extremely old. Even with IBM Z-series mainframes you better bet that there's periodic SQL checks against those.

The thing that has changed in the past few years is that it is more and more feasible to collect metadata, both from your data warehouse as well as from other tools in your individual ecosystem. Through a combination of decreasing cost of compute and storage, to better API support and better metadata natively available within these systems, it's feasible to get all this metadata without impacting your core data warehouse workloads.

When it was expensive to run these data quality checks and it makes it hard to load a dashboard, you'd prefer loading the dashboard over the check.

So the underlying tech has changed, but the third reason, which I'm most excited about, is practical. At the end of the day, language is a tool to help you think and communicate. And if a term doesn't help, it shouldn't exist. So the question is: does data observability help you think?

From my experience working and talking with hundreds of data leaders, there's a downside to the term, which is that it's fuzzy and it's a little bit amorphous. It's hard to get, which is no fault of the people hearing the term cause it is fuzzy and there's plenty of different definitions, which is a downside of the term.

But what I've learned is that when data leaders say, "oh, okay I get it, I'm gonna start communicating to my team in terms of data observability communicating to the rest of the company in terms of data observability," that it is a powerful way to organize your own thoughts and your own conversations.

It summarizes many different connected, but abstract, terms, like the state of your data and how it evolves over time. It's a mouthful to say that, but when you can put all that into one neat box, it becomes a very high information content, high bandwidth term. So almost all of our customers now use data observability [like they're] talking about a database, talking about ELT I'm talking about data observability like, I get it.

David: [That's a] similar experience I found and I've been speaking to a number of data leaders in the space and when you ask them what data observability is, either they know because they're used to talking about it from that kind of vendor space, or they come up with like their own idea of what they think it is.

And often like I agree with the idea, even though it's slightly adjacent to, what we are trying to do. But it maybe it's just like the surface area is bigger than we realize.

Kevin: Good point. Which is, another downside and upside of the term, right? Of course when something means everything, then it means nothing. But those recurring themes, for example, adding in monitoring spend, monitoring usage... they do fit under the umbrella of visibility. They're important questions to answer.

David: The number of times I've heard someone mention like log aggregation as well, because it's quite common for software observability tools to do that, and so they're expecting that. It's not wrong, it's just not on the menu today.

Kevin: I'm curious in those conversations, are there some themes that come up? . When people mention use cases or piece of information that we don't have, what are those typically?

David: The big theme of where there's like a divergent thought is: they expect data observability to be this control plane into their data, where they get to see everything that's going on and all the different alerts from different systems together.

I think that's a natural thing that data observability tools will also do in the future. But that's the thing when they're thinking about data observability and what they would want from that term. So if this is the term I'm gonna buy, what do I want from it?

That's, I think that's the most common thing that I hear from them that I would say data observability is not quite today.

Kevin: Who are we to say what should be supported? Ultimately the truth lies in the problems that real people face. Both of us have worked in data for a very long time, but the moment we crossed over the fence to be a vendor, we are not the people in the trenches doing the work anymore.

The best thing we can do I feel, is, take the concepts and the problems that real people are facing and try and aggregate it in a useful way and then support it in a tool at scale.

How do you explain data observability? [00:26:48]

Summarize: Kevin and David discuss how to explain data observability to different audiences, such as technical or business-oriented individuals. They highlight the importance of grounding the conversation in use cases that each party can relate to and emphasize the collection of metadata to identify and address data quality issues before they become significant problems.

‍

David: How can you explain data observability to different audiences?

Kevin: It depends on the audience. There's a couple of branches in that tree, where the first branch is: is this person technical or are they on the business side of the house? And if this person is technical, then are they engineering adjacent? For example, have they heard of a tool like Datadog?

If they worked in an engineering org or frequently talk to engineers, I would just say data observability is like what Datadog does for our software, but for data, and reinforce the point that data can go wrong and software can all be green. Almost all of the time, it's a light bulb moment.

If someone is technical and they haven't heard of Datadog, Then I would ground it in use cases where, data quality issues, monitoring, spend, understanding lineage... these are problems that you have faced, for example, as an analytics engineer.

But they all have the same ingredients in common, they collect all this metadata about your data systems over time and surface that or analyze that in a different way.

It's like burritos and tacos and enchiladas. They all have beans, rice, and cheese. You're just assembling it in slightly different ways. But at the end of the day, the question is: how can you get these bulk ingredients into your kitchen?

But those two are if someone is technical. If someone is on the business side of the house, I would recommend communicating it a bit differently, and importantly, to have the conversation in terms that every party understands, grounded in use cases that everyone has faced.

For example, if you're talking to a VP of Sales about data observability, their eyes might go in two different directions. My eyes are already going in two different directions when I hear that.

This applies to communicating with stakeholders in general... [what] are the problems that someone is facing? This Looker dashboard is not up to date. The number in that dashboard is not what your SDR put in. That is a data quality issue.

We can address that data quality issue by collecting information about the data, so that we know as a data team, if these numbers are wacky, or this table is stale and so that we can find that out and remediate it, many times before you even hear about it. And this is made even more explicit with teams and use cases that interact directly with end users. For example, with marketing teams. Do you want to send out an email campaign that has the wrong use data, that you send it out to the wrong audience?

Of course you don't. And how did you address that? By getting data about your data, by trying to understand beforehand if this audience is correct or if this usage data is correct. One way to do that is collect all this metadata over time and hence data observability. So I would approach it from that perspective.

Would you add anything to that?

David: I also think about it from like, when you inside that technical space, there's like data people and non-data people are probably in, what you might call tech or IT or something else. For data people I often talk about the problems that they would encounter.

It's this typical thing where, something that's resounded with people today, even, like Keith mentioning a stakeholder being the first to notice problem is definitely an excellent definition of bad observability.

Let's go through that bad observability use case and how, this helps you avoid that and reduce that happening.

It is almost like explaining the value of it outright and then this is how we achieve that value. That's the kind of way I've explained it before to some people as well.

Kevin: Which leads to the question: how do you show the ROI of data observability?

We should probably talk about that another time, especially in this economic condition. If teams are being challenged with an important task of showing ROI on data, it's even harder to show the ROI on data quality and data observability.

Even last week I said data observability needs to pay its way. So yeah, we could definitely make that topic.

Writing test suites manually [00:31:39]

Summary: While having a test suite is necessary, it's not practical to maintain a test for every column and subset of data, as data is constantly changing and can come from various sources, making detection and resolution of issues time-sensitive.

‍

Kevin: Does anyone have any other questions?

David: Amit mentioned if everything like data infrastructure and contracts are green and the data is still wrong, we need to think about what data quality checks we have and how we can monitor our data best.

That mindset comes from usually with software, when you have a bug, you have an incident, you go through the bug and then you probably implement a few more tests so that if the bug should happen again, or to prevent the bug ever happening again, and to detect it in development should something cause that bug happen again.

There is an element that you should do that in data. You find this bug and it's yes, let's make a, let's make a dbt test. Let's make a Great Expectations test, so we're making sure that this doesn't happen. And that could be because of something that you might be doing in development, in data engineering, or analytics engineering that could cause that problem.

But I also think that there are things that you just can't write a test for, or that it's unreasonable to write a test for every situation, like things like splits or values in a column like an enum or categorical column. Do you want to write tests to test the exact split and the expected range of percentage range of say, this marketing channel is paid marketing.

Usually it's between 25 and 35% of the values. Oh, today it's 39%, statistically high, too high. There's a limit to what you can write tests for, because otherwise you build this giant test suite which you have to maintain, which is a huge amount of work if it's a big test suite.

Kevin: You're spot on that not only will you have to write a giant test suite, that no matter how large it is, won't catch every issue. I'm not trying to be fatalistic here. I'm just trying to be realistic. You're not gonna write a check for every single column, every single subset of a column in your data warehouse or across all your transformation jobs.

But you're not only writing it once, you're having to submit PRs every single time the data changes. And data does change, it's the change in data that makes it interesting and makes it relevant for businesses. So when you're writing a large test suite, you're committing yourself to ongoing work as your data is changing, and as your data model is changing.

I don't think that's not worth your time. You should definitely do that. We have dbt tests. But the two kind of work hand in hand with each other, right? Explicit, critical, known issues versus unknown dynamic issues. You have to get both.

To make an analogy again, if you're developing software, you can't only have unit tests, you also need ongoing observability. We can't only have observability either, you should probably be having unit tests too. So the two come hand in hand.

David: With software, sometimes, like the outcome of how that software should work is much more easy to know because it can only take so many different inputs and have so many different permutations of output.

Whereas with data, what's gonna come in today? I don't know. What slight changes in percentage of any given field could come in that's of a natural input like that of humans or other systems outputting that data to you.

And that's, I think that's one of the things that really makes looking after data observability and software observability quite different.

Kevin: We could have a whole conversation about that. The fickleness of data, it's a double head sword.

It's like the richness that makes it interesting and useful is also what makes it hard to manage data coming in from so many different sources. And there's there's other characteristics too, right? Data is not like software. It's heavy. You can't cycle your EC2 instances on and off.

If something goes wrong, now you have to backfill your data. First you have to identify where the bad data or missing data went, to communicate [with] all of your stakeholders. Then you have to re-transform it and backfill it, which could take potentially a while. Every single second that passes since you found the issue or since you found it and are now remediating it is more and more time.

So detection is highly correlated with a time to resolution in our case.

Observability into dashboard usage [00:36:09]

Summary: Dashboard usage is important for incident management and optimization. David and Kevin suggest using metadata from APIs and query history logs to determine usage and reconcile with data in the warehouse.

‍

Kevin: And we have one great question from Renee: "observability with to dashboard usage, if you have an, with analysts and such all over company who write their own dashboards, what kind of observability do you get into which dashboards are still in use (particularly when they are not all in one tool)?"

This is a very good question and I feel like understanding usage of data has been underrated in the past and will become increasingly important, especially in our new era where everyone is trying to prove ROI. This is something that we support in Metaplane, it's something that we've tried to support from day one for two reasons.

One is that when we detect an incident, the first question is: "does this matter?" [If] a table falls over in the forest and no one is using that table, you can probably punt it. In contrast, if this table is used by a Looker dashboard that the CFO is looking at on an hourly basis, that's P0. And usage is so important for determining how critical issue is, and also who you have to communicate with in order to manage expectations and remediate.

The other use case is: helping gain awareness of what's going on. When you ask the question, "which dashboard are still on in use?," It might be useful to refactor the views and the tables behind that dashboard to optimize, for example, Snowflake spend. But it's also useful in reverse: what dashboards are not in use?

If Eric Edelman is still on the call, Eric is a legend because, leading the data team at Vendr had a weekly cleanup time. Let's pare down the entropy, like what dashboards and what tables aren't being used. I loved that approach because data naturally sprawls too. And it's not always a good thing.

But to actually answer your question, Renee, you can approach understanding dashboard usage from pulling in metadata from the dashboards themselves. Sigma, Looker, Tableau, they all have very powerful APIs for pulling in metadata and getting the lineage within these tools. For example, in Looker you have the LookML fields that you defined. You have dashboard elements that refer to those fields, and you have dashboards which contain those dashboard elements.

So you have that lineage, you have how often these BI elements are being used, and to complement that, on the other side of the coin, is the data that's in your warehouse. So by pulling in logs like query history: what role and what user is issuing this query. Occasionally you'll have a query tag on it as well.

You can reconcile the two for higher accuracy of usage and you can aggregate the queries into how often these tables are being hit in general. And by looking at query history, you can also support BI destinations or really any destination. It could be reverse ETL, it could be a custom tool or workflow that does not have an API

Those are the two ingredients, the rice and beans, so to speak, for dashboard usage. There's interesting questions about reconciliation, about changing your metadata model as dashboards evolve over time. But that's all like the salsa on top.

David: I like to think Erik's house is immaculate, it's got no duplicates, super minimal. Get rid of anything you don't need.

Kevin: Yeah, Mary Kondo your data: "does this data bring me joy?" I'm sure the answer is no for many of you.

David: How many dashboards bring you Joy?

Kevin: That's a high bar right there for sure.

Fundraise and academia [00:40:19]

Summary: Kevin discusses his background in academia and how it has helped him prepare for his role as a founder in data observability. Additionally, he mentions their recent fundraise and their goal to expand their user base.

‍

David: Obviously we've got some great news, but I'd love to ask you, Kevin, if you think about your background in academia and how that relates to like signal processing and how that comes back to when you describe lossy data and how that then feeds into data observability.

If you could talk about that and like how that's helped you prepare you for to be a founder in data observability and then obviously our good news from yesterday as well.

Kevin: What David is referring to is that yesterday we announced our fundraise we raised from Khosla Ventures, Y Combinator, Flybridge, and a bunch of angel investors from various companies that we're gonna take Metaplane to the next stage.

Around 140 companies are using us today. How do we get that to a thousand by both building the best product in the business, and also helping other people understand what the heck do you do?

And it's a very fair question to ask. I think coming from academia, the biggest help is having practice in a perpetual slog with an uncertain return that is doing grad school and also doing work on a startup.

Keeping up your morale, building up your stamina to work on difficult problems is something that's helped a lot.

But your question is about the specific, technical aspects. For us at Metaplane, we have the privilege of thinking about data at a meta level, no pun intended, as the metadata plane, the Metaplane.

But data leaders, they have shit to do. They have meetings and they're interviewing and they're answering questions.

We're lucky that we get to think from a different perspective: what is the health of data, what is the state of data and the visibility? Thankfully, I did research once upon a time in machine learning and data visualization, and the two each other extremely well, especially when working on Metaplane because, in some ways, machine learning is trying distill the most important aspects of a signal down to some features.

[00:42:40] Kevin: It might not be explicit, but trying to distill the most important aspects of data so that you can understand when that data is anomalous. And there's many aspects of enterprise and business data that you need to have a very fine tuned eye to look at it.

You can't throw a time series analysis library out of the box that has assumptions about data updating every second when your data updates every three hours.

But the other aspect of it is data visualization, where its essence is: how do we communicate complex information as quickly as possible? In Metaplane, the alert is only the start of the journey. The summary of usage is only the start of the journey. There's so many follow-up questions.

One common theme that we see in some of the best, most effective data visualizations in the world is that they not only answer a question, but that they get you to ask follow-up questions.

It's those follow up questions where the meat really is. And in our world, the follow up questions are the workflow. Okay, a table goes down, what do I do about it? Do you want a big block of text? Do you want a table with all your metadata, or do you want the most effective information to answer that question at any given time?

So it's kind of those two ways of communicating, and it's ultimately about communication that I think has helped a lot, in addition to knowing Python and slogging through the math. That is, on a day-to-day basis, less important than the ways of thinking that that slog, for better or worse, has helped me learn.

Closing [00:44:24]

David: Thanks Kevin.

Kevin: Thank you everyone for joining. We hope that you left these 55 minutes with a better understanding of what exactly is data observability.

If we didn't do our jobs, we'll do it again. Stay tuned for next time. We'll probably talk about the showing the ROI of data observability, how to implement it, maybe something like that.

David: Thanks for joining everyone, hope to see you next time.

Kevin: Take care. Happy 2023!