What is a Tamr ID?
Editor’s Note: This post was originally published in July 2022. We’ve updated the content to reflect the latest information and best practices so you can stay up to date with the most relevant insights on the topic.
A Tamr ID is a persistent identifier (PI or PID) that serves as a long-lasting reference to an entity across multiple data sources that Tamr assigns. Tamr IDs help to create a primary key that organizations can use across multiple databases when there is no common identifier for the same entity, such as a person, a supplier, or an organization. It can help provide a holistic view of the same entity across multiple operational systems, such as acting as an Enterprise Master Patient Index (EMPI) for patient health records. Or, it can provide the data lineage for an entity over time, for example, when studying a company growing by mergers and acquisitions.

What is a “Persistent Identifier?”
An identifier is a unique identification code that is applied to “something” so that the “something” can be unambiguously referenced. For example, an ISBN is an identifier for a particular book. In the United States, each citizen has a Social Security Number, which is an identifier for each particular person.
This concept of having an identifier is important in database deduplication and entity resolution. In most databases, the identifier is called the primary key: the column or columns that contain values that uniquely identify each row in a table. A persistent identifier is an identifier that is permanently assigned to an entity, regardless of the database where this entity is represented. Using the same analogy, once you assign an ISBN to a particular book, that number becomes forever associated with that book, regardless of the library or bookshop where the book appears. No other book will ever receive that same ISBN.
Why are Persistent Identifiers Useful?
There are numerous reasons why persistent identifiers are useful.
- Persistent identifiers are unambiguous: In database systems, we deal with records about many different things: a person, a supplier, an organization, and the list goes on. And, we have many different ways of referring to those records. But these identifiers leave room for ambiguity. For example, are Tamr and Tamr, Inc. the same company? A human can often discern the correct record based on context, but it is more difficult for a machine to correctly interpret the context without appropriate training and feedback.
- Persistent identifiers are… well… persistent: Companies and people can and will change names. It’s also possible that the same product might have a different product code in a company’s systems, such as SAP vs Oracle. But with a persistent identifier, organizations can connect entities across sources so it’s easy to track data lineage and provenance, providing insight into the data’s history as it changes.
Why is it Hard to Establish and Maintain Persistent Identifiers?
In the context of enterprise master data management, keeping the identifiers unambiguous and persistent is a difficult task. On the one hand, the organization is creating more data records and adding more data sources over time. On the other hand, business rules for mastering records may change. In order to maintain persistent identifiers, the database system must have the ability to compare two sets of clusterings: the previously published one and the current one.
As well, the system must be able to manage cluster member survivorship. This is particularly obvious in the mastering of corporate accounts and assets during cases of mergers, acquisitions, divestitures, and spin-offs. In these instances, records that used to be distinct entities are merged and records that used to represent the same entity are separated. This is a non-trivial problem both technically and visually.
Further, data governance plays a crucial role in managing persistent IDs by ensuring that data remains accurate, traceable, and interoperable across systems. Without persistent IDs, it’s difficult to manage data assets consistently across systems and platforms, which, in turn, leads to confusion, duplicates, or loss of critical lineage information. But when an organization establishes strong governance policies, it can ensure that these persistent identifiers are assigned consistently, maintained properly, and linked to authoritative metadata. That way, the organization can enable seamless integration and long-term usability while avoiding the risk of fragmented records, inconsistent data tracking, and compromised data integrity that weakens decision-making and analytical insights.
How Does Tamr Manage Persistent Identifiers?
Within Tamr’s data mastering workflow, the system assigns a Tamr ID to golden records produced from grouping (clustering) records together. Each golden record can represent a cluster of one to thousands records, and curators can easily make changes to auto-populated golden records including merging them, reviewing similar source records between them, and creating entirely new golden records from source records. The persistent ID is guaranteed to be unique and stable for any downstream tracking in other apps or business intelligence tools.

Further, Tamr allows users to revert any change made to a cluster. Reverting a change returns the source record to the cluster where it was assigned previously. For example, if you move a source record from cluster A to cluster B, and then move it to cluster C, reverting it will return the source record to cluster A. Users can access the history of these changes through the activity log, allowing for data lineage of their mastered entities over time.

In summary, persistent IDs are essential for maintaining accuracy, consistency, and continuity across complex data ecosystems. Using Tamr IDs, companies can reliably integrate their data and support real-time entity resolution, ensuring confident, long-term usability of their data to drive strategic decision-making.
Get a free, no-obligation 30-minute demo of Tamr.
Discover how our AI-native MDM solution can help you master your data with ease!