One of the most critical aspect of enterprise data management and data quality is Entity Resolution. Entity Resolution identifies connections between data instances that refer to the same entity and links them together.
Entity Resolution is the foundation of predictive and prescriptive analytics. We can not analyse business performance unless our base data is clean and consistent. We can not calculate customer lifetime value on data silos. We can not stay compliant on inconsistent and fragmented data. We can not gather operational intelligence when our vendor data itself is spread across multiple systems without a unique identifier.
Now a question arises: what is an entity? An entity can be defined as a unique thing – a person, a business, a product with a set of describing attributes like name, address, shape, title, price etc. A single entity can have multiple attributes like a person with two different email addresses or a company with two different phone numbers. A person can be represented as first name, last name, address in one application and full name, address1, address2, city, state, country in another.
Ironically, Entity Resolution itself is referred to by various names – record linkage, deduplication, merge purge, identity resolution, fuzzy entity matching, fuzzy deduplication, data matching etc. Let us look at a few cases of entity resolution in enterprise systems.
Entity Resolution for Insurance:
Insurance companies often struggle with fragmented data silos and lack of a single customer view. Different policies maintain their own record of the customer, and the insurance company as a whole lacks a single unique customer identifier with which it can understand coverage, risk, householding and other analytics of a customer. Personalised marketing and opportunities for cross selling are missed too.
Let us look at two records in different application databases of an insured customer.
We can see that the first name is spelled differently and has a salutation, middle name is also spelled differently, telephone number is without country code. Address 1 and Address 2 fields also have variation. This is a simplistic case. In most cases, even the database schema and columns will be different in the two databases.
As long as we are dealing with a few records, it is possible for us to make out the difference and relate the records with each other. But when we talk about common enterprise systems with hundreds of thousands to millions of records, reconciling the records is not possible and the analytics as well as compliance gets hampered.
Entity Resolution Healthcare Master Patient Index
While registering a patient, typical information collected would be first name, middle name and last name, address, telephone number. The patient undergoes treatment and builds a case history. After a few years, the person needs treatment for a new ailment and reappears at the hospital. His telephone number may have changed, and this time the details may get captured differently.
For effective treatment, it is important for the hospital to know that these are the same individuals. But slight variations in her details need to be reconciled, so that a precise case history can be built. This reconciliation is known as entity resolution.
The NASSCOM Deep Tech Club is a curated group of promising Indian startups working on cutting edge technologies. The club fosters innovation and entrepreneurship in the Indian ecosystem, promoting novel startups to the global stage. Our vision is to make India the hub for deep technologies, ushering in a wave of product startups respected in India and the world for their technical prowess.
When Nube got selected in the cohort of Deep Tech Club startups, we felt elated. It was a validation of our relentless pursuit of a tough problem. The selection also brought us in touch with mentors like Atul Batra who share our dream of building world class software products from India, playing not on manpower or price but on technical strength.
One nice aspect of the program is the Deep Tech Confluence, a full day event of startups, NASSCOM office bearers, bespoke investors and enterprises. Networking with unknown people is not really my core strength – I generally feel pretty lost in large crowds. Though I do a lot of tech talks, both nationally and internationally, small talk does not come naturally to me. My first instinct is to put my head down into my laptop and appear busy. However, the DTC event is different. The day begins with an ice breaking meetup of all the attendees, with guided discussions around what is working and how things can be improved. We then move on to Investor Connect, where startups looking to raise funds meet venture capitalists and angel investors. The meetings are curated, where prior interest of both parties is established before hand by the NASSCOM team. My first hand experience here is limited as we continue to stay revenue funded, but my founder friends have given good feedback on this session. While the investor connect is proceeding, I catch up with other founders who are in different stages of their journeys. In the recently concluded event at Delhi, I overheard proven entrepreneurs and startup legends like Bhanu Chopra from RateGain discussing state of the Indian startup scene in the hallways. The Indian startup scene is scintillating!
Similar to the Investor Connect, there is the Enterprise Connect, where executive stakeholders from top Indian and MNC retail, manufacturing, BFSI and other industries understand startup solutions and discover fitment to their needs. These executives have specific pain points and areas charted by their internal teams for which they are seeking outside help. In our case, this is the most critical part of the day where we get qualified leads for our product. The NASSCOM DTC team painstakingly circulates startup offerings to enterprises and curates the meetings based on their needs. Many of the executives we interact with at the Enterprise Connect session already have use cases with direct application of our product. They also have budget approvals from digital transformation and other programs within their companies to go ahead with an engagement. From our perspective, this clearly shortens the sales cycles. Sometimes, the meetings throw up new ways in which our product can be used, or help us connect with a vertical which we had not yet discovered. Customer acquisition is the lifeblood of a startup, and the event provides a guided platform to achieve this.
If you are a startup founder with dreams to do your best work and build a deep tech product out of India, do consider applying for the program. If you are an enterprise with a challenging set of problems, look no further, there is a whole lot of startups working day and night on some of your direct challenges. We are passionate, we have been vetted and mentored by leading technologists and business leaders of the country through NASSCOM. And we have a dream – to put India at the forefront of product innovation.
Come, join the journey!
-SonalPosted on February 21st, 2020
A named entity is a real world object which can be denoted through a proper name. Named entity can be persons, organisations, countries, currencies etc. When we look at text in the form of sentences or paragraphs, different entities may be mentioned in them. For example:
Sachin played a spectacular match at the Eden Gardens today.
Here, Sachin and Eden Garden are named entities standing for person and place respectively.
Here, Nube Technologies and Reifier are named entities representing company and name of software.
Named Entity Recognition is typically done through Natural Language Processing. One earlier technique involved tagging parts of speech to identify nouns and then identifying entity types through pattern matching. A more comprehensive survey of techniques for NER can be found here.
NER helps in understanding text, question answering, grouping together relevant information about entities for news, analysis etc.
Entity Resolution on the other hand is linking the same entity in different records where a common identifier is missing. Entity Resolution works on structured text in most cases, like customer or company records, though it may also be applied to long texts like product names and descriptions.
Suppose there is a hospital registering a patient. Typical information collected would be first and last names, address, telephone number and date of birth.
First Name: Anne
Last Name: Smith
Address: 123, Milwauke Dr, Connecticut
Phone: (123) 456 7890
The patient may undergo treatment and build a case history. After a few years, the person needs treatment for a new ailment and reappears at the hospital. Her telephone number may have changed, and this time the details may get captured as
First Name: Ann
Last Name: Smith
Address: 123 Milwauke Drive, Connecticut
Phone: (213) 645 7908
However, it is important for the hospital to know that these are same individuals. But slight variations in her details need to be reconciled, so that an effective case history can be built. This linkage is known as entity resolution. Check a few more samples and challenges here.
Entity Resolution is typically done through rule based systems, though recently a lot of work has happened on the ML/AI based approaches including deep learning.
Get in touch if you have any challenges around entity resolution for your business.