Should You Buy, Borrow, or Build a Data Observability Tool?
Developers often debate whether they should buy or build the software they need to do their jobs, and data engineers are no exception. There is no one-size-fits-all answer to the debate. Instead, you must consider all relevant factors when making a decision.
In this guide, we analyze the pros and cons of building your own tool, leveraging an open-source tool, or buying an out-of-the-box tool. Along the way, we touch on everything from time and money to expertise and support to customization and compliance.
Option #1: Build a custom, in-house tool
Building a custom, in-house data observability tool is exciting to some but daunting to others. Like any initiative, it comes with clear benefits, costs, and risks.
When you build something from scratch, you have an intimate understanding of what it is and how it works. From the infrastructure to the application itself, you’d know what’s under the hood. Plus, you’d get how the raw data differs from what’s displayed in the user interface. For example, some data observability tools perform table scans, while others collect metadata to detect changes in your data.
Building in-house allows you to design a tool that meets your organization’s unique needs. In fact, it may be necessary if you want to integrate with bespoke data sources that commercial tools don’t support. That said, it’s also a great option if you want to embed the technology into a custom workflow or build features that aren’t widely available.
If security and compliance are important to your company, you may prefer to build in-house. After all, that’s the easiest way to retain custody of your data. If you want to be HIPAA compliant, for example, building in-house would help you avoid exposing your data to a third-party vendor or needing the vendor to become a business associate.
Developing a custom-built tool requires lots of engineering hours, both upfront and on an ongoing basis. This is exacerbated by the fact that data engineers are not typically skilled software developers. The first dashboards you create are never the dashboards you use forever. You inevitably make changes that improve on the foundation. The same can be said for software. Over time, you’d need to add features, make improvements, and fix bugs. Unless you have a dedicated data infrastructure engineer, this work will always be a distraction from your team’s core responsibilities.
While not as expensive as commercial solutions, custom-built tools do cost money. In terms of financial costs, you need to pay for hosting. But you should also account for the wages of the team members who build the tool. Upfront, they’re likely to work on this project full-time for multiple weeks. On an ongoing basis, they’re likely to spend at least a couple of hours per week maintaining the tool.
When you build in-house, you lose out on the product and engineering expertise that commercial and open-source vendors have to offer. Over time, these tools are pressure-tested by countless organizations. They’ve already faced the edge cases you’d inevitably run into. Remember: Building to 80% is easy; it’s the last 20% that is tough.
You will probably underestimate the number of engineering hours it will take to build and maintain the tool. It’s also likely that your team doesn’t have the skills necessary to optimize the uptime and stability of the application. After all, they aren’t DevOps experts.
Option #2: Leverage an open-source tool
Using an open-source tool to build a custom application can offer the best of both worlds, but it can also be a risky venture.
Benefits and Costs
By leveraging an open-source tool, you can spend much less time building the application. However, there’s the added time spent integrating your custom solution with that other system, plus the hours it takes to continuously update to the latest version. Ultimately, it’s up to you to look at the code and decide if the benefits outweigh the costs.
Like with custom-built tools, you must pay for the hosting yourself. You’d also need to pay the open-source tool provider, and costs rarely remain unchanged. The time it takes to build and maintain the application should also be taken into account.
You don’t get commercial-level support when you leverage an open-source tool. No one will tell you exactly how to fix your problem. However, you do get a passionate community of developers who regularly report known bugs and openly share their knowledge with each other. It’s also important to mention that, because data observability is a new category of software, few open-source tools exist to begin with, and are often early in their development.
You may be able to customize your tool to meet your needs, or you may not. It really depends on the architecture used by the open-source tool. If it offers abstractions, you can build custom integrations much faster than you would from scratch. If it doesn’t, it could be a nightmare trying to integrate your custom code into the system.
It’s not unusual for open-source tools to go dormant. If supported by a for-profit company, the company’s strategy may change, or they may simply go out of business. If supported by an individual or group of individuals, their passions may change, or they may direct their attention to more profitable ventures.
Option #3: Buy an out-of-the-box tool
Buying a commercial tool is the easiest way to adopt data observability. You pay money, and you gain access to ready-made software.
Buying an out-of-the-box tool requires the least amount of engineering hours and offers the fastest time to value. In fact, you don’t have to build anything. All you need to do is purchase, implement, and configure the tool to suit your organization’s needs. For example, you’d need to plug in your warehouse, transformation, business intelligence, and communication tools. You may also want to configure your test frequency or alert sensitivity.
When you buy a tool, you automatically get the product and engineering expertise that comes with it. With professional software engineers continuously monitoring and improving the uptime of the application, for example, most days you’d have nothing to worry about. Of course, every tool has downtime; it would just be minimal compared to an in-house tool.
Commercial tools typically have larger feature sets than in-house tools, since they must cater to a wide range of customers. Data observability tools typically offer testing and anomaly detection, schema change and job monitoring, as well as lineage and usage analytics features, among others. Of course, it’s up to you to determine which features are important.
No tool is perfect, and commercial tools are no expectation. But when something goes wrong, technology companies typically offer some type of support. Here at Metaplane, for example, we create shared Slack channels to ensure your concerns are addressed as quickly as possible. It’s also common for vendors to offer email, phone, and live chat support options. Some companies, like Metaplane, even offer service-level agreements (SLAs) to ensure they meet your expectations.
Commercial tools are more likely than open-source or custom-built tools to still be around years from now. Their founders and leaders are motivated to make their companies successful, whereas a new employee may not care enough to maintain your in-house application and the company that sponsors the open-source tool may stop supporting it.
Purchasing an out-of-the-box tool is the most expensive option. Commercial vendors charge you for hosting, plus a generous markup. If you have the cash to spend, it’s a great solution. But many data teams struggle with tight budgets.
Commercial tools offer the least amount of customization. You ultimately have no control over whether they build a desired integration or that new feature you’ve been waiting for. That said, you can always provide product feedback and make requests. You just can’t guarantee that they’ll be answered in a timely manner.
While it’s more likely that a commercial vendor will still be around 10 years from now, it’s possible that they could go out of business or be acquired by another company with different priorities.
A recommended order of operations
At the end of the day, you know what option is best suited to your needs. If you’re unsure, consider this order of operations: Look into commercial options first, open-source options second, and in-house options third. You know better than we do how busy data teams are. They usually lack either the size or capacity to build a custom, in-house tool.
If that’s true for you, going with a commercial option is probably your best bet. It has the most benefits, the fewest costs, and the least risk. On the other hand, if customization and compliance are critical to your team's success, or you have a limited budget but a larger team with the time it takes to build from scratch, a custom, in-house solution may be a better option.
Ultimately, the decision is in your hands.
If you're ready to try a commercial solution, sign up for Metaplane’s free-forever plan, or test our most advanced features with a 14-day free trial. Implementation takes under 30 minutes.