Know-Your-Customer Programs and How AI Helps
You’ve likely invested in Know Your Customer (KYC) programs, whether for a traditional reason (risk assessment and regulatory compliance) or for a strategic growth reason (sell more, serve customers better). And they’re humming along, presumably.
Or are they?
If you don’t have clean data feeding your KYC programs, you can’t possibly have a real-world picture of your customers, one that’s trustworthy enough for making critical decisions.
Customer-related data, perhaps even more so than other data, faces an uphill battle in the “clean data wars.” It resides in many silos (e.g., ERP applications, Salesforce.com, CRM systems). There’s the natural “drift” and disconnect that happens when so many different people are creating, adding to, and updating customer data as part of their daily jobs. There’s value in comparing and enriching your customer data with external data (for example, a licensed, “gold-tier” file of vetted data on global companies), bringing more data into the mix.
Traditional data unification methods are too slow, particularly given the variety of customer-related data. Processes like data deduplication, records clustering, schema mapping, and entity resolution and mastering at any kind of scale take too much time to execute properly. Some processes require a lot of skills and intimate knowledge of the data. With traditional master data management software (Master Data Management and Extract-Transform-Load), processes operate according to top-down-developed rules for data flow and logic. Business people define these rules, which then must be interpreted, coded and deployed by programmers. Often this process has to be repeated over and over again, whenever data changes happen and datasets are added.
The result: unacceptable latency for real-time-critical applications like Know Your Customer, particularly for businesses with a large, global customer base.
Breaking the Data Unification Logjam
As an example of this process, let’s take a look at a popular KYC application: risk assessment.
A global financial institution needs to perform ongoing risk assessment on its customer list to ensure that they are all legitimate customers (e.g., not shell companies, money-launderers or terrorists or even something as benign as a company with a similar-sounding name to an actual customer).The customers are globally spread out, large and small, and include commercial businesses as well as various governments.
To get that real-world view of its customers, the institution needs to develop a deduplicated and up-to-date list of customers. Here are the hoops the institution has to jump through to get there:
Data Ingestion: Ingest the data from the various siloed data sources.
Schema Mapping: Align the ingested data to a canonical schema. Wade through the various datasets for similarly named fields for the same thing, such as “Org Name” “Name “ or “Organization”. One dataset may have multiple columns such as “Org Name“ “Alt Org Name” “Alias” (e.g., the company’s stock symbol), and others may have duplicate columns (frequent). Resolve and map data into a unified attribute called, for example, Organization Name. Integrate this into your schema. Repeat for other attributes as necessary.
Transformation: Clean up the resulting data, such as the unified data or two input columns that prove duplicative upon further examination. Put it into the shape and form the institution prefers (this is the “T” in ETL).
Mastering: At this point, the data has been mapped to a unified schema and into the standardized or normalized format desired. It’s pretty clean, but it still has duplicate rows. In the mastering phase, knowledgeable people must define what records may be duplicates and label data according to business rules to create clusters of like records.
Validation: This involves people going through similar clusters of records and making sure they are being clustered appropriately. They add more labels and other metadata to make the records searchable and (hopefully) findable down the road.
Golden Records: Finally, IT/data experts create a single canonical, validated record that describes the entity, based on the institution’s business needs and desired view of the data.
Sounds relatively simple, right?
Now imagine that you have one million records–or more. (We’ve seen it). And that you’re going through them on Excel spreadsheets (We’ve seen this, too).
AI-powered Golden Records to the Rescue
By automating data unification with an AI-first approach, the institution can break the logjam in getting to (and maintaining) that up-to-date real-world picture of its customers.
Tamr software uses an artificial intelligence-driven process with a human-in-the-loop component. Tamr's proven B2B customer mastering models can automate ~80% of the unification work, invoking knowledgeable humans only to resolve disagreements between data records or resolve outliers by answering simple “yes or no” questions. Here’s how the Tamr system simplifies the time-intensive activities above:
Data Ingestion: Tamr works seamlessly with leading cloud data lakes and warehouses, as well as flat files containing your most important, cross-source customer data.
Schema Mapping: Tamr can take your defined customer data schema, import it into Tamr and create links between the columns from your input datasets and unified-attribute target columns. The system looks not just at the column name but also at any associated metadata or descriptions as well as the actual data within those columns. Tamr can thus deduce that numbers separated by dashes is probably a phone or fax number, words with an @ sign are probably email addresses and so on. If you provide examples from your first two datasets, Tamr models can use them to automate the mapping for the rest of your datasets in your KYC project.
Mastering: Here, the Tamr system really shines with its AI-powered, human-refined approach. With customer mastering AI models that have been proven over the course of dozens of engagements with globally recognized organizations, Tamr delivers master data results in days or weeks that would normally take months or years with traditional methods.
SMEs need only provide a handful of responses; the machine does the rest. A process that could have been excruciating and prolonged is now very easy, automated and (most importantly) scalable across the KYC project.
Validation: With Tamr, as much as the AI accelerates the process, you are still in control. Tamr software provides your data team and data owners with a simple curation interfaces for viewing record clusters, filtering them according to Tamr-generated confidence metrics and reviewing and refining them to meet your accuracy requirements.
Golden Records: Tamr speeds the creation of golden records to any level of granularity, from required fields to desired metrics.
Thanks to its AI-powered, human-refined approach, you can let your data speak for itself, without the additional, time-consuming and potentially confusing layer of business rules and extensive, error-prone human involvement.
It’s About Time
KYC truly is in the eye of the beholder. It’s different for every company. Our financial institution above had very strict requirements and process requirements for risk assessment, and rules that were working for them. A retail company with a customer360 program might only need a deduplicated “golden record” with some profiling of their customers or a particular customer or their top 1,000 customers.
But the basic approach is generally the same. And a common thread is time and effort.
An AI-driven, human-in-the-loop approach to mastering customer data can save enormous time by reducing the amount of manual effort up front and by speeding availability of clean, current and correctly classified data to business analysts or applications, improving analytic outcomes.
Here are some of the results we’ve seen from applying this approach to the spectrum of data unification activities in various KYC programs:
- a 75% reduction in the manual effort involved in customer-data integration and delivery of clean data to the company’s next-generation analytics platform (health care)
- ingestion and profiling of 35 large data sources with 3.7 million rows of data to produce 325,000 clusters of customer records, all in less than six months (financial services)
- ability to onboard a new system from landing data to mastery in just 5-7 days and to create a new golden record in a maximum of two days (financial services)
- a 40% reduction of duplicative customer records to feed a customer360 program (Manufacturing)
Whether you’re an established KYC user or a newbie, Tamr can help you understand what data you have, unify it across data silos to get “ground truth,” and then keep it continually updated using an AI-first approach.
To learn more about how Tamr can accelerate your KYC program’s success, contact us for a discussion or demo.
Get a free, no-obligation 30-minute demo of Tamr.
Discover how our AI-native MDM solution can help you master your data with ease!