Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

Announcing our Airflow Integration: Observability into DAGs

Monitor DAGs and Tasks for long runtimes, identify root causes for data incidents using end-to-end lineage, and have a single pane of glass to view the health of your data pipelines in one place.

February 12, 2024

Founding Engineer

Hooked on Data

February 12, 2024
Announcing our Airflow Integration: Observability into DAGs

Airflow is the tool of choice for building data pipelines for many data teams. Its ability to easily orchestrate tasks and dependencies help teams ingest data from source systems, run dbt jobs, as well as run transformations such as cleaning data, aggregating data, and modeling business logic.

But if you’ve worked with Airflow at scale, you've run into common challenges including:

  • Latency increases caused by inefficient queries or lack of compute resources
  • Difficulty in identifying root causes of data incidents that occur downstream of Airflow jobs
  • Having one place to see your entire data platform, including lineage from Airflow all the way to your BI tools

Ultimately Airflow becomes both a source of data quality issues downstream and a hindrance to resolving them.

That's why we're excited to announce Metaplane’s Airflow integration  — giving our customers an additional layer of observability into DAGs, Tasks, and lineage. By integrating with Airflow, Metaplane can now monitor DAG and Task duration for unexpectedly long runtimes, and extract the lineage of queries run through Airflow.

Monitor DAG and Task duration for longer than usual runtimes

Find bottlenecks caused by long running Airflow jobs

Metaplane uses machine learning to automatically monitor and predict how long your Airflow jobs should take based on previous behavior. When jobs take longer than expected to complete, Metaplane will open a data incident and send alerts where your team already lives, like Slack or MS Teams. Once Airflow metadata is sent to Metaplane, setup takes minutes as the platform auto-applies duration monitors to your most important DAGs.

This addresses:

  • Long-running DAGs for complex once-a-day or once-a-week transformations such as regularly batched cleaning scripts for large vendor-imported flat files
  • Isolating specific tasks that you’ve known to be problematic in the past. For example, you may be dealing with a SQL operator that regularly fails due to query timeouts.
Understand the lineage of queries run in Airflow

Identify root causes of data incidents using lineage

Similar to the object relationships that you see in lineage generated using APIs, metadata, and query parsing, integrating Airflow in Metaplane can also show you which queries are run as part of your DAGs.

As an Airflow user, this means that you’ll be able to immediately understand which Airflow DAGs or tasks were the root cause behind a missed update to a table or a stale dashboard.

Connecting Airflow to Metaplane

After installing the Metaplane Airflow provider, you can establish the connection either through the UI or by creating an environment variable. After you’ve established connection properties and configured your callbacks accordingly, your duration monitors will begin to receive inputs from your task and DAG durations. Keep an eye out—they’ll automatically alert you to any spikes you might see in the future. 

Get started today if you don’t already have a Metaplane account! Implementation, including monitor configuration and data stack integration, takes no more than 30 minutes to get set up. If you run into any questions, please don’t hesitate to reach out to our team.

We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

Your email
Ensure trust in data

Start monitoring your data in minutes.

Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.