Impala

Impala is an open source, interactive SQL engine for Hadoop. With Impala, you can use business intelligence (BI) tools to run ad-hoc queries directly on the data in a cluster, stored either in unstructured flat files in the file system, or in structured HBase tables. Compared to Hive, which is optimized for long-running batch queries at scale, Impala is optimized for interactive queries on smaller data sets where users expect responses in seconds.

Like Hive, Impala uses the Apache Hive query language (HiveQL) and Hive metadata, meaning Impala can be used to query the same tables as Hive as long as the underlying data types, file formats, and compression codecs are supported by Impala. Impala supports the most common SQL-92 features of HiveQL, including SELECT, joins, and aggregate functions. Unlike Hive, Impala does not support customization through User Defined Functions (UDFs).

Users can issue queries from the Impala-shell command-line tool, or from a business application through an ODBC or JDBC driver.