Inside Data with Ben Cohen @ SpotOn
Ben Cohen is a seasoned data leader who has worked in several data roles throughout his career at companies like Cars.com, Braintree, and Cameo, and is now leading a data team at SpotOn.
Talking with Ben was fun because he has a great sense of humor, but is also a realist who encourages self-awareness and is focused on results. We were excited to speak with Ben about his experiences creating high leverage data teams, operationalizing data, and how he sees the future of the data ecosystem evolving.
Hey Ben, thanks for taking the time to chat with us! Before we dig in, can you tell us a little about SpotOn and your role there?
SpotOn offers a suite of software for small to medium sized businesses. Think reservations, appointments, online ordering, ecommerce, and point-of-sale systems for restaurants, automotive, salons, and professional services. We have about 1,200 employees at the moment and are growing quickly. In May of this year we raised our series D funding.
I'm currently the data engineering lead for the analytics team at SpotOn. That means my time is spent planning and executing our end-to-end analytics pipeline—from ingestion, transformation, and management of our BI layer. We are a small but mighty data team of 6 analysts and 2 engineers including myself. We try to leverage Fivetran for most of our data ingestion but also use Airflow + Singer + Meltano for custom integrations. Our data lands in Snowflake and we use dbt for modeling, and Metabase and Tableau for our BI layer.
In your experience, what is the most effective way to operationalize data so other teams can take advantage of the data your team has ingested, stored, transformed, and visualized?
It’s tough! Honestly it’s not one of my strengths, and I usually look for assistance in managing these relationships. It’s hard to do because you need soft skills that can be tough to refine.
The way I think about it is—you need to spend time with the stakeholders that want operationalized data and figure out their pain points, because they may not come out and tell you directly. Talk to everyone. Everyone at your company should be using data to make decisions on some level. Connect with them and show how you can provide value, like reducing the amount of time it takes to make informed decisions or how you can unlock insights that weren’t available before. It’s often about providing data that shows the value of our products to small and mid-size businesses.
For example, it wasn’t until we spoke with some of our product team that we learned that end-of-day accounting is a big deal for a restaurant and can take a lot of time. We are leveraging our analytics data stack to reduce how much time that takes from 2+ hours down to a few minutes, which speaks directly to their bottom line and also helps our business succeed.
At a young, small organization it could be different. You don’t have a lot of resources, and there are critical answers the company needs like “how much money did we make?” If those are the problems you’re trying to solve, solve those problems first before moving onto the meatier, more product focused requests.
Where do you see tooling and concepts like data quality, lineage, and cataloging fitting into operationalizing data?
They’re super important. As you scale and grow, you’ll have less ability to have direct context with some of your stakeholders. You’re going to need them to be self-sufficient.
Part of operationalizing data is empowering your stakeholders. If they don’t trust your team’s output and documentation is lacking, that’s hard to do. The outcome is that you will be spending a lot of time fixing logic bugs in your ETL or BI layer. It’ll slow your team down if you don’t have that in order. It’s also going to delay the adoption of anything you do put out. People are going to be slow to pick it up on their own, and they may not trust it.
There’s a multiplicative effect: without these tools, there’s the potential that your team will slow down, and therefore slow down your stakeholders at the same time.
If you were starting a data stack from scratch today, which modern data stack tools would you adopt and why?
Stepping back, I think that adopting the modern data stack helps teams cognitively offload concepts and work that used to require larger teams like ingestion infrastructure or database management.
For early stage startups this may not apply, but for anyone ready to invest in a dedicated data team and central data warehouse, I’m a fan of tools like Fivetran, Snowflake, and dbt. However, I try to be vendor neutral and would recommend tools that offer a similar value proposition. For example, a tool to help you ingest many sources, a database that separates storage and compute, and a transformation tool that leverages SQL. So that could also be Stitch, Matillion, Dataform, Firebolt, etc. Choose tools that just work and allow your ETL to be written in code, open to anyone who knows SQL, version controlled, and leverage testing strategies.
How do you convince a team and get buy-in for a POC of new tools like a warehouse and ETL tool?
Good question. My honest answer is that the modern data stack has so much to offer, that selling this idea to a small- or mid-sized organization has been relatively easy in my experience. You can do so much more in less time, with fewer resources (people and money).
Whether you are speaking to a data engineer, ETL developer, analyst, or a stakeholder in marketing/finance/etc., my advice would be to speak to those folks about the new capabilities and how it will make their lives better. I think a good deal depends on having an understanding of your company’s culture and values.
One of the things I like about SpotOn is that we are super focused on the client, so when selling the idea of the modern data stack I kept going back to how this will make our clients more successful. For example, items that gained us traction with stakeholders that we’re already working on include 1) a recommendation engine for our online ordering product and 2) leveraging the consolidated data warehouse to provide restaurants and retailers with the ability to send hyper-customized campaigns and marketing promotions to their customers.
Through this lens it was easy to get folks excited about the possibilities of a new data platform. That's actually been one of the most rewarding things about my role at SpotOn, seeing that excitement and engagement carry through to other areas of the business. What we’ve been able to do so far as a team has been really impactful, and a large part of that is because of the tools we have adopted. Something that isn’t always discussed is the cultural impact of data. I think, in general, access to data empowers people and the modern data stack is a big part of making that access happen.
You’ve worked in data at a wide variety of companies, from Cars.com to Braintree to Cameo and now SpotOn. How did your experience vary working in data at these companies?
The biggest differences for me are the learning opportunities, work-life balance, and the gravitational pull of large data, or I believe what others have started to call data gravity.
At smaller companies, you can get your hands dirty in a bunch of different areas. You may not learn the best way to do something, but you’ll spend a lot of time being scrappy, just getting things done and learning a lot along the way. As a company grows, it needs more guardrails.
At larger organizations, you’ll start using best practices, and implement more checks and balances. The concept of data gravity just means that as data accumulates at larger companies so does the number of applications, services and 3rd parties interested in that data. I’m not quite sure what the exact inflection point is in terms of database size, but this gravity can go from being a good thing (funneling additional resources and responsibilities) to a burden (increasing latency and making it difficult to implement the smallest of changes).
I’ve been fortunate in that I’ve worked at companies of various sizes. There have been times to specialize, times to get stuff done, and times to step back and learn. For job seekers, my advice is to figure out what you’re trying to get out of your relationship with work and use that to help find the right fit.
What have been the biggest shifts in the data world, both culturally and technologically?
Two of the biggest related shifts have been the move to the cloud and shift in economics in what it costs to run a data warehouse. When compared to what we used to pay for on-prem solutions, storage and compute have gone from being prohibitively expensive to essentially free which has changed how data teams are built and operate. Compared to what it used to cost, the compute and storage we have access to now is insane. Access to that has been pretty transformative in terms of what you can store, analyze, and visualize.
Another shift I’d call out is Tableau because of what it’s done for the analyst community. I think it’s given analytical minded people a tool to explore and create a story on their own without having technical skills. I’m not saying Tableau is right for everyone—the market today has so many great offerings for BI tools. But Tableau came along at the right time and was totally differentiated from what we saw before. It provided a ton of value that wasn’t matched by the BI tools that predated it.
Lastly, I would add that everything is just easier to do end-to-end. It’s easier to ingest data into a database, visualize it, tell a story, and do advanced analysis. You used to need a team of specialized people to do this kind of work 10-15 years ago.
What are some of the trends you think will happen over the next one to five years?
I don’t think we’ll see another database come along and do what Snowflake did for everyone in the next 3-5 years. I think we’re at the tip of the iceberg of adoption of the cloud data warehouse. After that, we’ll see the next evolution of storage and compute... who knows what that will look like!
A trend I’ve been seeing is open source tools spinning out of larger companies and turning into fully managed SaaS products. Tools like Astronomer, Superset, Metabase, Acryl, and dbt cloud come to mind. I think this will be a pretty big space over the next 3-5 years and I think the business model makes sense.
One struggle is that cloud warehouses have grown a lot, but there are still underlying issues that haven’t been resolved like source of truth, how a metric is defined, where it came from and whether or not I can trust it. In a sense the cloud warehouses have made these problems more complicated. You can get your hands on a lot of data but without documentation and governance things can get messy. I haven’t seen a lot of organizations solve this problem very well, but I think it’s a common struggle and hopefully we’ll see better solutions over the coming years.
If you could send a Slack notification to every data person in the world, what would it be?
The data space is growing and changing so quickly that it’s tough to keep up. There will always be someone who knows this other new tool, and I see a lot of people struggle running to the new hot thing not because they love it and enjoy it, but it’s a compulsion to know as much as the next guy. Focus on the most important company and team goals, and forget about imposter syndrome.
Also, Kimball 4lyfe.