Technology intelligently weaved to deliver the most elegant solution. Not just because it is hot.
We believe in using the best tools to solve business problems, and our architecture choices are made through careful optimizations for
- highest accuracy
- run time performance
- ease of use
- applicability to multiple domains
- fast deployment
Reifier utilizes Spark for distributed entity resolution, deduplication and record linkage. Keeping data in memory helps Reifier iterate very fast over different permutations of possible record matches to generate the best model for data matching. Our proprietary machine learning algorithms sit on top of Spark to provide the best entity resolution and fuzzy data matching with a scale out distributed architecture.
Reifier has been covered in major international conferences data analytics and AI .
Reifier’s proprietary AI engine learns string similarity from data and generalizes that to deduce the optimal fuzzy matching rules for any domain and language. If there is no training data, Reifier Interactive Learner samples the data and lets the user mark some pairs as matching and non matching. Typically, about 20-50 pairs are all that is needed, which can be easily marked by even the support staff. Check how it works.
Some of the features of Reifier fuzzy record matching and deduplication technology are
- Reifier can generalize from training samples to provide very high data matching accuracy
- As no hand coding of rules is needed, deployment of fuzzy matching is blazing fast
- Developers can concentrate on business logic instead of figuring record matching algorithms
- Reifier can be used in multiple domains with different fields
- Reifier is language agnostic and can match data in chinese, english, japanese, thai and other languages
- Scale out architecture – run on single machine or full blown Spark cluster for deduplication and record linkage
Depending on data volumes, Reifier can be simply run on a standalone machine with Spark libraries if your organization has not adopted big data yet. A standalone machine is good for upto a few million records. If you already have a Spark cluster, plan to deploy one or need our help to do so, let us know.
Posted on August 19th, 2016