In this post, we explain a simple Python based Map Reduce program to deduplicate user records. The data files are in text format, many users are repeated and we want to have a single copy of each record. Hadoop’s streaming can be used to get Python programs to talk to the Hadoop framework. We create [...]
read more
(C) 2013 Nube Technologies LLP
