Entity Resolution: Build vs. Buy

Entity resolution is an essential data management technique that helps to ensure enterprise data remains clean, complete, and trustworthy. While many organizations rely on entity resolution tools, just as many teams are exploring do-it-yourself (DIY) approaches. While DIY entity resolution can offer flexibility and control without the overhead of traditional entity resolution software, it’s not without its pitfalls. To help you determine if DIY entity resolution is right for you, let’s explore what entity resolution is, how to decide if you should build vs. buy, and what to watch out for when pursuing a DIY approach.
What is Entity Resolution?
Entity resolution addresses the challenge of reconciling records across and within datasets by linking multiple disparate datasets into a logical entity such as a person, business, or product. By detecting and matching records that are the same, despite differences in spelling, formatting, associated attributes, and other discrepancies—and assigning a unique ID to these matched records—entity resolution ensures that these records are recognized as one unique entity moving forward.
When DIY Entity Resolution Could Be the Right Fit
Effective entity resolution requires an ongoing, iterative, and consumption-oriented approach that results in trustworthy golden records that enable users to drive both analytical and operational outcomes. It’s hard work, which is why most organizations opt to use pre-built entity resolution tools.
Nevertheless, there are instances when a do-it-yourself approach is a viable option. For example, Python libraries like dedupe and Record Linkage Toolkit can match, deduplicate, and link records across sources, providing an effective enough solution when datasets are relatively small (less than a few million records) and the data doesn’t change very often. Scripts and custom data pipelines also work well for resolving data that consumers use in periodic data analysis, such as quarterly reporting.
5 Pitfalls of DIY Entity Resolution
While a DIY approach can be a viable option in certain circumstances, it’s not without its pitfalls. If you are thinking about taking a DIY approach to entity resolution, first consider the following:
- Scalability: DIY methods can deliver immediate benefits for organizations looking to resolve entities across a single source or relatively small datasets. Yet, as data volume and complexity grow, scalability quickly becomes an issue. While a script running on your laptop or a single server may work initially, these approaches will quickly reach their limit as you add more sources, datasets, or connections.
- Error tolerance: The tolerance for errors is a significant consideration when it comes to DIY approaches to entity resolution. And for industries such as financial services or healthcare, where accuracy is critical, making a misstep is costly. DIY methods often require organizations to have a higher tolerance for error, as these approaches may return more false positives or false negatives.
- Security: Security is another key consideration for companies that deploy DIY entity resolution. Not only must the DIY solution ensure protection of sensitive data for compliance purposes, but it also needs to establish the appropriate governance and controls to protect secure information. The more users you have, the greater the need for robust security and governance protocols, which quickly adds complexity to your DIY solution.
- Hidden costs: Although DIY solutions may initially appear more cost-effective, it’s important to consider the cost to update and maintain these solutions over time. From expanded infrastructure to additional resources and support, what initially begins as an affordable approach can quickly become quite expensive to maintain as your DIY solution gains traction. Further, when thinking about cost, it’s important to think beyond dollars and cents. Consider the time your team needs to invest in building, maintaining, and governing the solution.
- Skills gap: Building and maintaining a DIY entity resolution tool requires specific skills and expertise. And as the solution becomes more complex over time, so, too, do the skills required to evolve and maintain it. As you evaluate if a DIY approach is right for your organization, consider the skills you have in-house as well as the capabilities you’ll require to build vs. buy and then maintain your solution over the long term.
In addition, be wary of vendors who tout software development kits (SDKs) for entity resolution. These “solutions,” marketed to empower DIYers, come with the same risks as a fully DIY approach.
An AI-native Approach to Entity Resolution
We’ve all heard the saying “just because you can, doesn’t mean you should.” And in the context of entity resolution, this adage definitely rings true. DIY entity resolution is a viable solution in some situations, but the bigger question remains: Is it worth the effort?
The answer to that question is “it depends.” But builders beware! Understand the true costs of creating a DIY entity resolution solution before you choose this path.
While DIY entity resolution can be a viable option for customers who are motivated by the challenge of building their own solution, it’s important to hold the line on costs. DIY solutions become very expensive to maintain, scale, and operationalize, both in terms of time and money. If you decide to go down this path, explore the use of free tools to help keep costs in check.
Alternatively, instead of trying to build it yourself, companies that want to implement entity resolution, especially those that need to keep costs in check, should explore a pre-built AI-native master data management (MDM) solution. By employing AI to tackle the hard problem of entity resolution, AI-native MDM is built to help businesses automatically identify and resolve inconsistencies across multiple data sources and increasingly expanding datasets.
Further, AI-native MDM provides additional capabilities that are difficult to incorporate within DIY solutions, including:
- Persistent IDs
- Data lineage
- Match verification with external data
- Data enrichment
- Real-time, search-before-create workflows
As a result, your organization can realize numerous benefits such as:
- Faster time-to-value: Onboard new internal and external data sources quickly and easily to realize value faster.
- Greater scalability: Handle the largest volumes of data possible using cloud-based technology, optimized for scalability.
- Machine-learning-first approach: Free up technical resources by up to 90% and increase automation using machine learning capabilities that improve as data grows and evolves.
- Improved accuracy: Employ AI-powered technology, combined with human feedback, to boost the accuracy of your entity resolution process.
- Increased flexibility: Manage different use cases in different ways, and easily adjust as your needs evolve.
- Predictable costs: Avoid unexpected costs associated with scaling, maintaining, and operationalizing the solution.
As modern data continues to evolve and expand, AI—and AI-native MDM—will secure their place as critical tools to support entity resolution. And as data gets bigger, and data sources proliferate, the only feasible way to keep up is by embracing AI and using it to resolve entities and create golden records that extract valuable insights, enable greater flexibility and scalability, and drive true data-driven decision-making.
Need help justifying the investment in AI-native MDM? Our latest ebook, "How-to Guide: Building a Business Case for AI-native MDM," can help. Download it now.
Get a free, no-obligation 30-minute demo of Tamr.
Discover how our AI-native MDM solution can help you master your data with ease!