Here are the slides from Strata London 2019 where we presented Reifier 2.0 and discussed the architecture choices we made to support master data management for any entity at any scale and in any format and language. Multidomain MDM is made possible through AI, where Reifier learns patterns from the data itself without handcrafting
any rules or algorithms. AI allows Reifier to handle address matching, customer data unification, company master data and supplier data analytics with ease.
As Reifier does not restrict customers to adhere to a strict data model but rather adapts itself to the needs of each domain, we needed a flexible on the fly schema adaptability. This is made possible through Apache Cassandra, which can handle the scale and the diversity of data that we handle. Reifier Interactive Learner sieves through the data and picks out edge cases, so that users can focus on diverse data instead of worrying about thresholds and algorithms.
Apache Spark gives clean hooks through the Dataset API into different source systems and formats, so Reifier can integrate data silos like NoSQL databases, RDBMS, S3, Azure Data lakes etc in different formats – Avro, Parquet, CSV, JSON, XML etc with ease. Besides learning patterns for fuzzy data matching, our algorithms also learn how to best index and partition the data and distribute it on the Spark cluster so that we can scale effectively for fuzzy data matching.
More details in the slides. Reach out for a discussion today on how Reifier can help you in your master data journey.
« Beyond the black hole of dirty data