Dribbble designs a data quality solution with Metaplane
“Metaplane has made it so much faster not just to find issues but also who else to alert. We’re much quicker now to communicate to leadership and others, and that’s created more trust in the data.”
The entire Dribbble organization is dedicated to continuing to grow the space to make it a better place for design talent, and key to that growth is the use of data to drive decision making. We were lucky enough to speak with Ashley Melanson, Analytics Engineering Lead, about how data’s played a part in Dribbble’s growth to date, and where Metaplane fits into their data strategy.
If you’ve ever worked on large marketing or product teams, then you know just how hard it can be to find talented designers, and there always seems to be a need for more capacity. At Dribbble, they’re tackling this problem by creating a space that connects both brands looking to add to their design resources as well as designers looking for their next projects.
The data team at Dribbble handles end-to-end pipeline creation and management. This includes:
- Ingesting data from different sources such as their paid ad platforms, payment processors, and production data from their applications
- Transformations of data including modeling and cleaning to curate objects for internal analytics
- Placing safeguards on their pipelines to prevent bad data from making its way into production systems
All of this is done with the goal of serving the other business units, including Product, Marketing, and Finance, answer questions like:
- Where should we focus upcoming marketing campaigns?
- How can we make sure that our educational content are useful for designers?
- How can we make sure that our platform creates an intuitive user experience?
To answer all of these questions and more, Ashley and her team knew that they needed to begin with clean, accurate data.
Ashley and her team took a proactive approach to implementing safeguards prior to Metaplane with their use of unit tests. They put a few tests in place to ensure that schemas and nullness rates were consistent, and that tables were updated at specific cadences. The magnitude of their project to implement safeguards across all of their new data pipelines in Snowflake began to show itself when it came to object coverage, different data quality metrics, and most importantly, maintaining the thresholds set in tests to ensure accurate alerting.
❝We don’t want to get all these false positive alerts because we experience alert fatigue, and end up missing alerts for legitimate issues. That was one of the number one evaluation criteria for Metaplane.”
Data Quality Monitoring
With the criteria of calculating and maintaining thresholds in mind, Ashley and the team set up freshness monitors, and were able to immediately see how the monitors used metadata from Snowflake to determine freshness on an object level.
❝Unit testing is great, except that it only gets you so far. One of our biggest issues was with static freshness thresholds. Realistically, every table could have its own cycles when it comes to freshness. Metaplane is great at managing those thresholds because it learns how often each table should be updated, which has been essential in standing up our production grade pipeline with latency dependencies.”
In addition to tracking data freshness, they were also able to expand their schema change and nullness rate monitoring for full object coverage, in addition to adding another check for data completeness in the form of Row Count monitors which allowed them to put an additional safeguard in place.
Dribble’s data team also tracks a few unique business metrics, such as ad impressions over a specified time window, which they were able to implement through the use of Custom SQL monitors. They had previously been tracking these metrics in a Metabase dashboard as a reference point, but wanted to evolve the process to be more automated to receive alerts instead, so that they’d be able to resolve the issue in a timely manner.
❝For example, we have custom SQL tests for ad impressions available and ads served. It’s especially important that the volume of this data remains accurate because Dribbble offers advertising deals for bigger brands on our site. If that volume ever looks off, then we know we need to immediately start working to identify the issue.”
After discovering an incident, Ashley and the Dribbble data team benefited from using Metaplane’s view of downstream impact, which showed the downstream Metabase dashboards and cards, as well as column level lineage for a more detailed view of specific downstream fields. The combination of those two features allowed them to move towards proactive incident notifications to continue to build trust from stakeholders that the team had both visibility and the means to fix an incident.
❝Metaplane has made it so much faster not just to find issues but also who else to alert. We can see all of the models or data tests that are failing, how they’re linked to each other, and see the reporting that’s linked to all of that. We’re a lot quicker now to communicate them to leadership and other business functions, and that’s created more trust in the data and our team by extension.”
With the combination of automated, scalable data quality monitors and resolution features, such as column level lineage, the team has actually been able to recover time previously spent working on this.
❝Without Metaplane, we wouldn't have enough test coverage to safeguard against incidents. That would also mean spending a lot more time debugging and resolving incidents, all the way from determining impact to stakeholder communication. If we did all of that, it leaves a lot less time to think of a more long term solution, which we’re ideally doing for every single incident. Having more time to think about proactive incident prevention is a luxury that we didn’t have before Metaplane. ”
In addition to gaining time for more valuable planning and strategic model creation, the Dribbble team has seen benefits from implementing Metaplane such as:
- Having a better sense of security, more confidence in their data, alerting, and overall infrastructure
- Reducing incident resolution time
- Increasing trust in the data team through proactive incident notification to impacted stakeholders
❝While it’s still a work in progress, we’re definitely seeing less issues being flagged by others, before we’ve been alerted. When other business functions are consistently coming back to you with issues, there’s going to be a lot less trust in the data. If we’re the ones being proactive about it and saying ‘Hey, we found this issue in the data, and here’s what we’re doing about it’, that trust can slowly be restored.’”