How Imperfect Foods uses Metaplane, Snowflake, and dbt to break down data silos
"Without Metaplane, we wouldn’t be as proactive with data quality. There would be a lot of unknown data issues out there lurking that we’d discover when someone stumbles upon them."
Challenge: Breaking down data silos as the company and data use cases grew
Imperfect Foods is an e-commerce startup that exists to help eliminate food waste and build a better food system for everyone. Data is used by every team at Imperfect Foods and is at the center of most decisions made by the company. Whether teams are calculating customer acquisition costs for different marketing channels, running experiments for signup conversion, or trying to understand inventory and fulfillment bottlenecks, data has become critical in the company’s operations.
As the company grew, Adam Smith, an Analytics Manager at Imperfect Foods, noticed that understanding how the business was doing holistically became harder to answer because the majority of the data had become siloed across teams.
With a team of only four analysts supporting over 200 data consumers, there weren’t always enough resources to centralize and operationalize data to answer important questions. When data was being used, it wasn’t uncommon for teammates to ping the analytics Slack channel on whether the data could be trusted.
❝The way we found data problems was when someone posted in our Slack channel that a report wasn’t working or was returning weird results. We’d normally catch data quality problems with end users days or weeks after something happened”
Adam and his team needed a way to centralize data in one place, create a scalable process to model data for everyone to use, and wanted to monitor data quality so the entire organization trusted the data.
Solution: Remove the friction of using trusted data and leverage everyone’s SQL skills
Snowflake as the centralized data cloud
Adam and his team were able to centralize data in Snowflake so any teammate could answer questions about customers, finance, and marketing. Because Snowflake’s model makes it simple to scale resources up and down, the data team didn’t have to worry about maintaining databases or spinning up resources.
In addition to easily scaling resources, Snowflake’s integrations made it easy for Adam’s team to ingest and report on data. The Snowflake data share functionality dramatically reduced the friction of using data shared by a half dozen partners. Data shares removed the need for complex ETL systems and simplified data governance and processing.
❝Because [Snowflake] is a market leader, everyone has a connection to Snowflake whether you're ingesting new data like a Fivetran or reporting on it using Mode
Not only did Snowflake reduce the complexities of maintaining a data stack, but the usage based model helped Imperfect manage costs more predictably and allowed them to save 20% compared to the beginning of the year.
❝The usage based pricing allows us to scale with the tool and have predictable costs.
With Snowflake’s separation of compute and storage, integrations, and data sharing, Adam and his team were able to finally centralize data that the entire team could use without needing to hire additional headcount.
dbt cloud for modeling data and creating leverage for the data team
Once the data was stored in Snowflake, the Imperfect Foods data team used dbt cloud to model data. Like all startups, their team was human resource constrained and needed an easy to use and versatile tool. Dbt was the perfect solution - it empowered the data analysts to model data without needing to know complex SQL knowledge.
❝It’s really easy for an analyst, without a lot of SQL knowledge about inserts, updates, stored procedures to use dbt. You can use select statements to do the hard work for you and leverage the skills they had to build out a data warehouse to answer the questions the organization needed.
Because dbt has software development practices and documentation built in, it helped the team move quickly and make less mistakes. Functionality like lineage and awareness of downstream impact provided easy ways for analysts to understand the downstream impact, especially when paired with Metaplane.
With dbt cloud, Adam’s team was able to easily operationalize the data in Snowflake. Analysts could model data using simple SQL statements, and the rest of the organization could rely on this data to automate business processes around product, marketing, and sales. Because of dbt’s flexibility and scalability, the tool has grown with them over time which has allowed them to remain lean.
❝dbt has been an incredibly flexible tool that has grown with us. We still don’t have the need for a data engineer or database developer.
Metaplane as their automated data observability platform
After the data team centralized data models for more of the organization to use, they still needed a way to monitor the quality of data. Because data was being used in critical paths of the business, it wasn’t enough that the data was available to everyone - it needed to be trusted, too.
Adam and his team were able to set up Metaplane in minutes because of the integrations with Snowflake, dbt and Mode. Their most important tables and columns were automatically monitored, and they started receiving schema changes in hours.
❝ [Metaplane] is really easy to use and you don’t need any training. Automatic monitoring and the level of detail you can get right in Slack is really helpful
As a small team, time is hard to come by, so being able to understand the impact of data incidents and prioritize work is invaluable. Because Metaplane is monitoring all of the data tools, the team always has context when issues occur which helps them prioritize what to work on. For example, if a data incident affects a report that runs on Monday mornings, but it’s Tuesday afternoon, the team knows they don’t need to drop everything to fix the issue.
❝Because of Metaplane’s integrations, we receive context with every data incident. For example, it can tell us a table is broken in Snowflake, it may have something to do with this dbt job, and it’s impacting these 5 reports. It allows you in one place to get a good handle on how big of a problem this is.
Adam and his team also use Metaplane’s dbt job duration monitoring which uses machine learning to automatically alert data teams when their dbt models start taking longer to run.
❝I love the dbt run time monitoring which helps us understand performance and spend. To have that history there and go back, we can see the improvements we made as a team over time.
With Metaplane, the data team at Imperfect Foods could rest assured that the data they worked so hard to model could be trusted by the entire organization. If anything in their data stack failed, they became confident they would be the first to find out and could fix issues proactively.
❝Without Metaplane, we wouldn’t be as proactive with data quality. There would be a lot of unknown data issues out there lurking and we’d discover when someone stumbles upon them.
- Using the Snowflake data cloud, the data team could centralize all data and no longer needed to worry about scaling databases.
- Snowflake helped centralize all partner data using data shares, saving weeks of ETL development time.
- dbt helped the data team consistently model data using simple SQL statements, unlocking superpowers for the analysts.
- dbt brought software engineering best practices and automation that helped the team remain lean; they didn’t need to hire an additional database administrator or data engineer.
- Metaplane ensured trust in the data and team. Discovering data quality issues went from days or weeks to minutes or hours.