As the year draws to a close and the holiday season begins, its that time again to reflect on the past and look forward to the future. They say “Time is not measured by clocks but by moments”. Wishing you lots of spectacular moments in the coming year, cheers!Posted on December 20th, 2018
Happy and excited to announce that I will be presenting at Strata Data Conference London. Strata Data is the leading big data, data science and machine learning conference with phenomenal sessions by leading experts on data storage, preparation, wrangling and analysis practices. The conference also covers data strategy and architecture, lessons learnt implementing big data and machine learning projects with detailed case studies.
My talk is titled “Mastering Data with Spark and ML”. Watch this space as the full agenda for the conference is announced.
See you all,
Festive season is round the corner and we wish you all a very happy, clean and joyful Diwali. Now that we mentioned clean, cant help but thinking clean data, but that’s for another day!
Cheers!Posted on November 4th, 2018
We have been working on version 2 of Reifier since some time. We are now at a stage where the core functionality is beginning to shape up, and we have a solid foundation to deliver the next generation fuzzy matching, reference and master data management system using AI. When we started Reifier, we designed it as a big data file based application working on data residing in local, S3 or HDFS files. Clean formatted delimited text files. Same structure of all the records one would like to match. At that stage, it felt right to keep this simplification, we wanted to make sure that the most critical pieces of the puzzle were highly accurate and performant. The big data preprocessing and matching was sure to be the focal point of our system, and we wanted to make sure that we focused relentlessly to keep it at least 20x accurate, performant and scalable than traditional rule based or threshold based systems.
We also wanted to make sure there was a market for our work and we reasoned that once we had sufficient traction, we could pull in additional data sources later. After multiple deployments, we felt restrained by our first version. There is the big vision and roadmap for Reifier, the data management application for the future, using AI on large data sets to uniquely link records and build golden copies. In the near future, Reifier will be able to understand entities, prescribe and suggest business actions, drive enterprise revenue and pro-actively manage risk. Our customers have been asking us for more data sources, more formats, more data massaging, peek inside the models, online training, better data stewardship and we have been aching to deliver them. Which is good – it solidifies their relationship with Reifier and their need to integrate it more and more into their work.
Reifier v2 is our answer to many of these dreams. We have migrated from the RDD based API to the Dataset API and this has led to a much smaller code base, even as we have the ability to support diverse formats like JSON, XML, Parquet, Avro, CSV etc as well as data sources like Salesforce, Redshift, Cassandra, HBase etc. Earlier, we had our own internal data structures to represent records, with data marshalling and unmarshalling based on field types. Now we just use the Row class. We have also moved from our home grown feature engineering, evaluation and transform pipeline to the ML based Pipeline in Spark, just plugging in our custom models, transformers and evaluators. All this has dramatically reduced our code base while giving us the ability to focus on our core algorithms and let Spark take care of the rest.
We are excited by the recent changes, and though we have spent a good part of last and this year in moving the code, our slimmer leaner code base will allow us to move rapidly in the direction we want to take Reifier to. Watch this space as we unfold more details about Reifier v2 – the absolutely ground breaking master data management engine for your data!Posted on August 7th, 2018
It seems even wild animals can benefit from fuzzy matching! In a new proposal, the Thane wildlife Warden has requested the linkage of forest and wildlife criminal data with police records to ensure that character certificates appropriately reflect the crime history of an individual. Here is the news articlePosted on May 9th, 2018
After the festivities of the new year, the team has been working on deploying Reifier on the Databricks cloud. Databricks provides a unified analytics platform with a fully managed Spark service. Some of our newest customers run massive analytic and predictive workloads on the Databricks Cloud. They want the ability to run Reifier from within their data processing pipeline – building 360 views, removing duplicates, understanding relationships and consolidating accounts. Reifier was architected from the grounds up to be Spark application, and we had spent our effort to be Apache Spark Certified. As Databricks platform provides an optimized Spark environment, minimal changes around the creation of SparkContext were needed at our end to make the transition.
We are happy to announce that we have tested Reifier on the Databricks Platform. Reifier has always supported multiple deployment models, enabling enterprises to master their customer, vendor, location, employee and other data seamlessly on an infrastructure of their choice – AWS/Azure/Google Cloud/Data Centre. With Reifier on Databricks, we provide our customers yet another option and flexibility.Posted on February 5th, 2018
Wishing you all a very happy, healthy and prosperous new year! Hope the new year brings in new joy and successes. May unified data lead you to greater customer acquisition, enhanced compliance and operational efficiency and smoother mergers and acquisitions!Posted on December 28th, 2017
Effective selling means knowing your customer, understanding their pain points, learning their journey and helping them achieve their goals. (more…)Posted on May 1st, 2017
As a company, we are committed to building tools and techniques that enable our customers make informed choices by utilizing and maximizing the value of their data assets. Business is tough, conditions are hard, but we want to make sure that we make a difference to your bottom line. Here is the first tip of the series, stay tuned for simple techniques, analysis and tips for better business..Posted on April 30th, 2017
Kudos! Reifier got covered in Analytics India Magazine – the number one platform analytics, data science and big data, dedicated to passionately championing and promoting the ecosystem in India.
Read the full story here
Posted on April 30th, 2017