HIHO at HUG

We made a presentation regarding HIHO at Hadoop User Group India meetup hosted in July, 2010 by Impetus Technologies. Here are the videos of the meet. HIHO at HUG India HIHO at HUG India Presentation for the talk can be downloaded here

read more

Working with Cascading

Map Reduce applications tend to get very complex due to the sheer volume of the data and machines. Designing and debugging map reduce spread over many machines is an art in itself. If we add a framework like Cascading, we save time and effort as we can abstract Map Reduce and think more in terms [...]

read more

Amazon Elastic Map Reduce Lessons Learnt

Elastic Map Reduce is a great web service to get up and running with Hadoop without setting up own clusters. We recently worked on a vertical search engine using EMR. As part of our processing, we had our initial data on S3, and we also wanted to place the fetched data on S3. We were [...]

read more

Why HIHO?

Currently, there is little support in Hadoop for querying the database and getting the results. Significant effort and time has to be spent by the application developer to extract the data from the database. The existing DBInputFormat and DataDrivenInputFormat are table based, so if one wants to get data from multiple tables, one has to [...]

read more

Hello HIHO!

We are glad to announce the beta release of HIHO, an open source framework for integrating datastores with Apache Hadoop. This post introduces HIHO and talks briefly about its capabilities. Data which needs to be analysed in Hadoop is often stored in conventional data stores. Typical data analysis tasks can be: 1. Match profile information [...]

read more