Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

Import Historical Data Backfill

Metaplane's latest feature eliminates training time needed by machine learning models.

and
September 19, 2023

Product, Design

September 19, 2023
Import Historical Data Backfill

Metaplane was built to find data quality issues across your data stack. Traditionally this was done with unit testing, where unit tests profiled and re-profiled your data through sampling queries and found any data that was outside of the parameters you define. But Metaplane’s anomaly detection differs quite a bit from the traditional approach by virtue of our machine learning-based data quality monitors. These monitors work out of the box, and require 3 days of training on your real data. This allows us to create appropriate ranges for your data quality metrics based on factors like trends and seasonality, and the models continue to update as your data changes.

At least—that’s how it used to be.

On August 8th, we announced our new monitoring platform, with the promise of more to come. Our most recent product release, the ability to backfill models with data from a .csv file, means that you no longer need to wait to reap the benefits of machine learning for your data quality monitors.

Use cases

While there are immediate benefits for anyone who’s simply eager to get started with Data Observability (link) to find data quality issues, there are a few specific scenarios in which you might want to leverage this new feature:

  • Warehouse migrations - Imagine you’ve recently made the decision to switch vendors. While you’ll still need to extract and import any historical data that you’ll want to use in future analytics, you can now use those same extracts to pre-train Metaplane monitors. These data quality safeguards help foster a smooth migration.
  • Dev vs Prod Environments - For teams with separate environments for cloned production data, usually for testing purposes, you can leverage this feature for a new dataset that you’d like to monitor for data quality anomalies in your development environment.
  • Snippets” from another table - In some scenarios, you might be creating new views and tables derived from a parent table—for example, a breakout of revenue per day from a parent transactions table. You’d be able to use data from the parent table to train monitors on the corresponding child tables.

Where to get started

You’ll need to log in or create a free Metaplane account to get started. From there, navigate to a monitor that you’d like to import training data for. You can find the import button in the actions menu of any monitor page:

Note that your CSV will need to follow a particular format, as outlined in our documentation. Please get in touch with our team if you’d like help with extracting or formatting imports.

Read more on recent improvements to the Metaplane platform here!

Table of contents
    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    No items found.
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.