This sample scala program loads HBase/M7 tables into an RDD and points to how in-memory processing in Spark can be used to augment the performance of real-time applications served by HBase/M7.
One of the advantages of Apache Drill is the capability to process raw text files in their native format without first having to create schemas/metadata to define what the data looks like. Here is a quick example of how Drill can query a simple csv file.
This is a great example of Pig and Hive in action. The data set used is publicly available, making it a great self-help tutorial to play with.
The following program illustrates a table load tool, which is a great utility program that can be used for batching puts into an HBase/M7 table. The program creates a simple HBase table with a single column within a column family, and inserts 100,000 rows in a batch fashion.
Here is a powerful utility that lets you audit your environment when deploying Hadoop. The topic introduces you to Clush.
Here are some quick pointers to ensure that your JVM settings on Hadoop are well-tuned to avoid heap space errors.