Data Quality is a Team Sport: Applying the RACI framework
We've created this resource for you and your teams to start thinking about how to distribute roles when it comes to managing data quality.
If we think of improving data quality as a project that we’re planning for, then a common first step is to identify the team members responsible for executing on cleanup. To do so, we’re borrowing from the RACI model.
There’s a good chance that you’ve already heard of the RACI model for project management, but here’s a quick recap. RACI is an acronym that stands for:
- Responsible: who owns execution step(s) within a project
- Accountable: who owns the results of the project
- Consulted: who will be pulled in as needed for the project
- Informed: who needs to be aware of what’s happening for a project
As we think about our Data Quality Project, this framework gives us an easy way to think about who owns what. After all, members of the data team might be responsible for data quality, but with different data teams owning different parts of a pipeline, and end consumers often sitting outside of the data team entirely, it begs the question: Who are the RACI stakeholders when it comes to data quality?
Our data team that we’ll be referencing in examples below manage internal analytics dashboards and reports used across every department.
Responsible - Data Engineers
Data engineers can have a wide range of responsibilities, but for purposes of this article, we’ll consider them to own data ingestion and modeling extending to raw data cleanup and normalizing into a standardized format for analytics to use.
- Validate issue triage: Using the notes from a business stakeholder and / or analyst, data engineers will be able to confirm whether there is an actual issue, and may also be able to locate “where” within a pipeline an issue originated from.
- Fix issues outside or upstream in the data stack: After confirming an issue and its origin, a data engineer will be able to adjust their pipeline logic to resolve the issue (and hopefully avoid it in the future through an incident retrospective).
In Metaplane, data engineers should be responsible for setting up monitors as well, using their knowledge of brittle processes and potential entry points for issues to identify which objects should have monitors and what types of monitors they should have. This will require collaboration with analysts, who can also assist with monitor setup. From there, they’ll be able to use the column-level lineage graphs to trace an issue upstream to understand where and how to focus resolution efforts on.
Responsible/Accountable - Analytics Engineers
Analytics engineers are a relatively new breed within the data teams - often conflated with dbt, as they’re responsible for modeling data within a warehouse and often help stakeholders develop reports, dashboards, visualizations.
- Triage issues: Analytics engineers should be equipped to understand what the scope of issues are and will typically test at different points along the pipeline to understand issue origin, along with understanding additional downstream dependencies (impact analysis).
- Fix issues within data stack: Because they’re responsible for establishing a data pipeline, it’s likely they also have the access to deploy any fixes related to the warehouse, business intelligence tools, modeling logic, or orchestration tools.
In Metaplane, we recommend making analytics engineers full fledged users, so that they can place tier desired monitor type(s) not just on problematic objects to be alerted to every issue, but also on the dbt jobs themselves, so they’re aware of abnormal job durations that may contribute to additional data quality issues. Using Metaplane’s Github application, they’ll also be able to validate the impact of changes prior to updating any models to avoid accidentally introducing future issues.
Accountable - Data Analysts
Here, we define data analysts as the owners of building visualizations for the business stakeholders.
- Example titles: Revenue analyst, Marketing analyst, Product analyst, Data analyst
- Triage issues: Drawing on their relationships with business stakeholders that they’ve created visualizations for, they’ll be able to confirm if an issue is a data issue, or simply a misplaced filter. After this confirmation, using their awareness of other data products they’ve built, they’ll be able to identify if an incident is isolated to a single dashboard or many.
- Preliminary issue resolution validation: As other team(s) are working on a fix, with the data analyst being a bridge between the business and other technical teams, they’ll be able to identify if a fix will satisfy the ticket.
- Fix visualization issues: In some cases, the issue might be caused by a inaccurate model defined in the business intelligence tool, which can be triaged and fixed by the analyst who built or maintained it directly.
With Metaplane, we recommend adding analysts to the tool with at least a Viewer role, so that they’re able to understand which table(s) an issue originated from (if any), and whether there are additional downstream dependencies that will impact other analysts and business stakeholders. You can optionally also grant users that allow monitor creation, so that analysts can customize their alerts and monitor configurations based on problematic objects. If your analysts are technical, they can have Editor roles in Metaplane but our customers have found it more helpful to have their Analytics Engineering counterpart own the actual applications of Monitors.
Consulted - Business Stakeholders
These are your end consumers - the ones that interact with data at its source (often applications) and rely on insights from merging various data sources to decide how to move forward.
- Example titles: Product Manager, Director of Product Marketing, Customer Success Lead, Sales Operations Manager
- Define requirements: It’s likely that an analytics data product was built for them to begin with - they should understand the purpose of the data products that they use, and also define other areas of scope, such as how fresh they need data to be.
- Domain knowledge: Because they interact the most with the data sources that they’ve requested, they should be able to explain whether your modeling output is as expected, or whether there are certain values that “look off”.
- Issue triage: They’ll often be the first to notice when a data product that you built has an error, and can flag if they see seemingly similar error(s) in their other dashboards.
- Issue resolution validation: After a fix is deployed, they should confirm that they no longer see the issue persisting. Bonus - confirm that there aren’t issues in other related data products.
With Metaplane sending alerts to the notification channel of your choice, business stakeholders can be added to those channels to not only be alerted on that an issue is occuring, but also collaborate for both the issue triage and resolution validation steps.
Informed - Business system administrators, Software engineers
This group will typically be working with business stakeholders, and have the ability to make changes in the source system that might affect your data pipeline. For example, a software engineer may add a few tracked field, or a Salesforce administrator may change the name of a custom field, both of which lead to schema changes.
- Example titles: Salesforce or Jira Administrators (Business System Administrators), Software Developers (Software Engineers)
This group only needs to remain informed about how changes they make might impact their shared (with the data team) business stakeholders. As a result, you can add them to any schema change notification channel that you have Metaplane delivering notifications to. They’ll be able to explain why a schema was changed, how a model might be updated to reflect it, and warn the data team of any incoming changes.
We’d love to share how more about how you can use Metaplane across different departments to keep everyone on the same page about data quality. You can get in touch with us here if you’d like to talk strategy, or set up a free account to use some of the practices above.