Announcing Warehouse To BI Tool Data Lineage
Most data engineering teams are flying blind and don't have an easy way to understand how data in the warehouse flows to downstream BI tools. Our warehouse to BI data lineage tool helps data engineers search for and discover these dependencies in seconds.
When users first get started with Metaplane, they often connect a warehouse and a business intelligence tool so that when a data issue arises, Metaplane can provide downstream dashboards and reports that would be impacted by that issue. This helps data engineers understand how serious the issue is, and whether it should be prioritized now.
Data teams that use Metaplane are always asking for new ways to save time and become better data engineers. One of the most common requests is a simple way to visualize and explore lineage that makes the lineage actionable and fits within their engineering workflow. We went to the drawing board and are excited to release our data lineage visualization and exploration tool that was inspired by three common use cases from our users:
- Build and spread data awareness across your organization
- Proactively prevent downstream BI issues
- Decrease data debt by fixing “low hanging fruit”
Build and spread data awareness across your organization
As data teams onboard teammates and start new projects, the first thing they need to do is get a holistic picture of how data flows through their systems and the complex dependencies that exist. The best way to do this is a bird’s eye view of data from ingestion, to warehouse, to transformation, and all the way to business intelligence tools.
With warehouse to BI tool data lineage, teammates can use Metaplane to get up to speed by showing them which parts of the warehouse are most used by downstream BI tools. For example, in the screenshot below, at first glance you can see that `fact_deals` and `dim_sales_reps` are used across both Mode and Metabase, but Mode is the center of gravity for all things business intelligence.
Building data awareness empowers data engineers and other teams to ask and answer their own questions, decreasing the time it takes to get up to speed and reducing the number of times they need to loop in other teammates. Rather than asking the data team “how is this table or column used?” and “what are our most resource demanding BI assets?”, they can simply navigate through the lineage visualization and answer these questions on their own in seconds.
Here are some of the data awareness questions Metaplane can now answer:
"What schemas and tables are most used by downstream BI tools?"
“How many downstream BI dependencies does this column have?”
“What tables and columns feed into this BI dashboard?”
While our lineage visualization is helpful in building more awareness of data, there are also ways to use this metadata in every day data engineering workflows.
Proactively prevent downstream BI issues
Data engineers constantly face the same issue when making changes to their data: it either takes too much time or is impossible to understand what downstream data assets would be changed or broken by these changes.
As a result, data engineers are less likely to make changes to their models and if they do, they often break downstream data. It’s never a great feeling when executives complain about broken dashboards or the marketing team says the wrong email automation was triggered because of data quality issues.
Metaplane’s lineage visualization solves this problem. Before data engineers refactor or delete existing models, they can now understand the scope of the changes and what would break before it happens.
Our customers use our lineage visualization to answer these questions in seconds:
“If I change this dbt model, what downstream Looker dashboards will change or break?”
“If I delete this column, what downstream Metabase queries will break?”
Decrease data debt by fixing low hanging fruit
It’s hard being a data engineer when your dream is to build scalable models that your teammates can use to make decisions, but your reality is servicing other teams with one-off requests and fixing silent data bugs.
Juggling these priorities can be difficult and often leads to data debt caused by an explosion of assets like one-off models and dashboards. In the long term, that only leads to more data related issues, causing data engineers to waste time on firefighting rather than building.
The reason it’s difficult for data engineers to get ahead of these issues is simply because the tooling does not exist to help them. With lineage exploration, our customers can find places to decrease data debt by answering questions like:
“How much work would be involved to remove this column?”
“How many downstream BI assets depend on this table?”
We're excited to see how more data practitioners use this tool to work faster and make fewer mistakes. If you’re interested in trying the visual explorer across your data stack, get started for free.
Table of contents