290 likes | 1.68k Views
Introduction to Sqoop. Table of Contents. Sqoop - Introduction. Integration of RDBMS and Sqoop. Sqoop use case. Sample sqoop commands. Key features of Sqoop. What is Sqoop ?. Sqoop is … a suite of tools that connect Hadoop and database systems Major functions of Sqoop
E N D
Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop
What is Sqoop? Sqoop is … a suite of tools that connect Hadoopand database systems Major functions of Sqoop • Import tables from databases into HDFS for deep analysis • Replicate database schemas in Hive’s metastore • Export MapReduce results back to a database for presentation to end-users
RDBMS important but vulnerable? Importance of RDBMS • Holds a lot of valuable data in the form of structured tables of several hundred GB • Provides fast access for OLTP applications like Update / delete records, Add individual records, Complex transactions Vulnerability • Can’t store very large datasets (1 TB+) • Poor support for complex datatypes/ large objects • Schema evolution is hard • Analytic queries better suited to a batch-oriented system
RDBMS and Hadoop RDBMS Historical data (before processing) HDFS Results of data Analysis (after processing)
Sample Sqoop commands Import using Sqoop Export using Sqoop sqoop export --connect jdbc:mysql://db.foo.com/corp --table ads_results --export-dir results • sqoop import --connect jdbc:mysql://db.foo.com/corp --table user-profiles JDBC mysql driver Output : mysql table Input : mysql table Hdfs location with analysis results
Key features of Sqoop JDBC-based implementation - Works with many popular database vendors Auto-generation of tedious user-side code - Writing MapReduce applications to work with data, faster Integration with Hive - Allows to stay in a SQL-based environment