Apache SqoopTM

Hadoop users often want to perform analysis of data across multiple sources and formats, and a common source is a relational database or data warehouse. Sqoop allows users to efficiently move structured data from these sources into Hadoop for analysis and correlation with other data types, such as semi-structured and unstructured data stored in the distributed file system. Once analysis has been completed, Sqoop can be used to push any resulting structured data back into a database or data warehouse so it is available for operational use.

Sqoop relies on parallel processing for its efficiency, using all multiple cluster nodes simultaneously. It also provides an API for custom connectors to be built that integrate with new data sources. Sqoop is able to integrate out-of-the-box with popular relational databases and data warehouses, such as MySQL, Oracle, PostgreSQL, Teradata, and Netezza.