Announcing Data Incidents

Incidents is a feature that automatically groups related failing tests for you while also presenting easy-to-digest, interactive summaries complete with potential root causes, aggregate downstream impact, usage, and more.

October 3, 2022

Co-founder / Engineering

October 3, 2022
Announcing Data Incidents

Data quality incidents rarely occur in isolation. Because data is highly interconnected via logical operations, issues like silent source system failures often cause several downstream tables and columns to break, not just individual ones.

While monitoring for individual data quality failures are important for coverage, we want to synthesize these failures into actionable stories for data practitioners (and business stakeholders!) into data incidents.

Incidents help group together related data quality issues based on issue type, lineage, and time correlation, allowing Metaplane to send highly actionable alerts and reminders, and provides a central place to view potential root causes and downstream impacts.

Grouping Related Data Quality Issues

Incidents are living entities - they are opened when one Metaplane tests fails, but if several related tests fail over time, then each failure will be attached to the incident. By grouping together these failures, Metaplane can automatically associate potential root causes and aggregate the downstream impact to other models, tables, columns, and business intelligence assets.

For example, if a source system does not replicate a schema to the production warehouse on time, several freshness tests for tables within that schema would start failing. Metaplane can group together these failing freshness tests, making it clear that a source system has stopped replicating data. In addition to identifying this root cause, the platform can show aggregated downstream impact such as the models, operational tools, or business intelligence dashboards that are now using out of date data.

Rather than being the last to find out about these issues, incidents allow the data team to flip the script and proactively communicate data issues to the larger team.

Receive Helpful Alerts and Reminders

Incidents help combat alert fatigue when multiple data quality issues arise all at once. Instead of getting alerted for every individual failure, Metaplane will only alert you about the related groups of failing tests once.

One common piece of feedback we received from customers was that individual data quality alerts became noisy if the underlying issue was not immediately resolved. Metaplane incidents will update the existing Slack alert and send daily reminders via threads so that your team is not over-alerted as you address the issue, while helping you stay on top of active issues.

Metaplane incident alerts include historical visualizations, downstream impact, and a way to interact with the model when data changes.

For example, Metaplane can send Slack alerts containing visualizations of multiple failing tests at once. If the underlying tests continue to fail, Metaplane will send daily reminders about failures. If the incident is resolved, the Slack alert will be updated to reflect the resolved state. Incidents also provide bulk actions such as muting all of the tests and offer a convenient way to give instant feedback to the machine learning models when your data changes.

Identify Root Causes and View Aggregate Downstream Impact

One of the most powerful benefits of incidents is the ability to identify potential root causes and aggregate downstream impact. Since incidents can associate multiple related failing tests over time, Metaplane can use warehouse and BI lineage to understand what may be causing an incident as well as what is impacted.

Incidents have a quick view into the total impacted schemas, tables, columns, and dashboards as well as an easy to digest timeline of failures and resolved tests.

This has allowed our customers to fix data quality issues faster and communicate incidents to downstream consumers, retaining trust across teams and keeping them aware of how incidents are being addressed over time.

With the incident timeline, the data team is able to view when the issues started occurring, when additional failures were identified, and when the incident was resolved.

How To Get Started With Incidents

Incidents is now available to all customers, including our free forever plan. After you connect your warehouse, incidents will just work. As you connect Metaplane to more data systems like ETL tools, dbt, warehouses, and BI tools, incidents will continue to provide richer context to help your team identify and fix data issues faster.

Incidents is now available to all users. If you’re curious on how Metaplane aggregates related data quality issues, check out our docs. If you want to try it, get started on our free forever plan.

Contents
    No items found.

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.