One of the most critical aspect of enterprise data management and data quality is Entity Resolution. Entity Resolution identifies connections between data instances that refer to the same entity and links them together.
Entity Resolution is the foundation of predictive and prescriptive analytics. We can not analyse business performance unless our base data is clean and consistent. We can not calculate customer lifetime value on data silos. We can not stay compliant on inconsistent and fragmented data. We can not gather operational intelligence when our vendor data itself is spread across multiple systems without a unique identifier.
Now a question arises: what is an entity? An entity can be defined as a unique thing – a person, a business, a product with a set of describing attributes like name, address, shape, title, price etc. A single entity can have multiple attributes like a person with two different email addresses or a company with two different phone numbers. A person can be represented as first name, last name, address in one application and full name, address1, address2, city, state, country in another.
Ironically, Entity Resolution itself is referred to by various names – record linkage, deduplication, merge purge, identity resolution, fuzzy entity matching, fuzzy deduplication, data matching etc. Let us look at a few cases of entity resolution in enterprise systems.
Entity Resolution for Insurance:
Insurance companies often struggle with fragmented data silos and lack of a single customer view. Different policies maintain their own record of the customer, and the insurance company as a whole lacks a single unique customer identifier with which it can understand coverage, risk, householding and other analytics of a customer. Personalised marketing and opportunities for cross selling are missed too.
Let us look at two records in different application databases of an insured customer.
We can see that the first name is spelled differently and has a salutation, middle name is also spelled differently, telephone number is without country code. Address 1 and Address 2 fields also have variation. This is a simplistic case. In most cases, even the database schema and columns will be different in the two databases.
As long as we are dealing with a few records, it is possible for us to make out the difference and relate the records with each other. But when we talk about common enterprise systems with hundreds of thousands to millions of records, reconciling the records is not possible and the analytics as well as compliance gets hampered.
Entity Resolution Healthcare Master Patient Index
While registering a patient, typical information collected would be first name, middle name and last name, address, telephone number. The patient undergoes treatment and builds a case history. After a few years, the person needs treatment for a new ailment and reappears at the hospital. His telephone number may have changed, and this time the details may get captured differently.
For effective treatment, it is important for the hospital to know that these are the same individuals. But slight variations in her details need to be reconciled, so that a precise case history can be built. This reconciliation is known as entity resolution.
Address Matching – Householding »
« Nasscom Deep Tech Club Confluence