Crux – Visualizing and querying big data saved in HBase
In this post, we will walk through an example of creating different visualizations with data stored in a HBase store. We will use Crux and explore its filtering and charting capabilities.
Firstly, some background. HBase is a columnar datastore for big data. Utilizing the Hadoop HDFS and providing direct support to Map Reduce, HBase has emerged as the choice NoSQL database for big data. HBase provides random access with low latency to big data, and works well with unstructured and semi structured data. HBase tables are sorted by rowkey, and each cell is saved as uninterpreted array of bytes.
HBase provides a rich Java API to query the data saved, and also supports other interfaces like Thrift, Rest etc. However, one has to do custom coding to get and analyze data from HBase.
With Crux, we can define a connection and a mapping for the data. In this example, we are populating stock data from the Bombay Stock Exchange. We use the demo dataset that comes with Crux, instructions for loading this can be found here. We use the composite key case, where our table is stockData and column families are price, stats and spread. Our rowkey is a composite, with the first part being the string stock id of length 6. Our date is a long of length 8 bytes and the composite key is a concatenated byte array of stock id and date. All the columns – price:high, price:low etc are floats.
Let us identify our hbase zookeeper connection and port to Crux.
Let us now define our table schema.
And create aliases for our composite row key.
We now define our column aliases.
This is how our column aliases are specified.
Let us now design the report. Lets drag and drop aliases of interest.
We now define some row filters – conditions which help us to select the data of interest.
We define an equals filter on the rowkey, and a greater than equals filter on the date. This will internally be converted to a HBase range scan. We are providing the values right here, but if we want, we could have saved the filters without any values and provided them dynamically at report generation time.
We can now preview our tabular report
Let us define a new report with date on the x axis and price high and low on the y axis. We want to view this as a line chart.
Here is our line chart.
And here is a bar chart.
We can also create scatter, area charts and tables.