410 likes | 665 Views
Using Data from a Hadoop Cluster in Oracle Database. Melli Annamalai. If you need to reset your VM. source /home/oracle/movie/ moviework /reset/reset_conn.sh.
E N D
Using Data from a Hadoop Cluster in Oracle Database MelliAnnamalai
If you need to reset your VM • source/home/oracle/movie/moviework/reset/reset_conn.sh
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Big Data Connectors Licensed Together • Oracle SQL Connector for HDFS • Oracle Loader for Hadoop • Oracle R Connector for Hadoop • Oracle Data Integrator Application Adapters for Hadoop • Announced at OOW 2013: Oracle Xquery for Hadoop
Program Agenda • Oracle SQL Connector for HDFS • Brief Overview • Hands-on Exercises • Oracle Loader for Hadoop • Brief Overview • Hands-on Exercises • (Optional exercise): Use both connectors together
Oracle Big Data Connectors Connecting Hadoop to Oracle Database Oracle Database Oracle Advanced Analytics Oracle Big Data Connectors Oracle Spatial & Graph Hadoop Oracle Database Acquire – Organize – Analyze Oracle DataIntegrator
Loading and Accessing Data from Hadoop Oracle SQL Connector for HDFSOracle Loader for Hadoop Input1 MAP Shuffle/Sort MAP Shuffle/Sort MAP Reduce Reduce MAP Reduce MAP ReducE MAP Reduce MAP MAP LOG FILES MAP MAP Reduce Shuffle/Sort MAP Reduce MAP MAP Reduce MAP Input2 Oracle Database
Oracle SQL Connector for HDFS Access or load into the database in parallel using external table mechanism Hadoop Oracle Database Use Oracle SQL to Load or Access Data on HDFS Load into the database using SQL Option to access and analyze data in place on HDFS Access Hive (internal and external) tables and HDFS filesAutomatic load balancing to maximize performance SQL Query OSCH External Table OSCH OSCH OSCH HDFS Client
Installing Oracle SQL Connector for HDFS Hive Client OSCH Hadoop Client OSCH External Table Oracle Database System Hadoop Cluster
Part 1a: Reading Hive Tables with Oracle SQL Connector for HDFS Hadoop Oracle Database • OSCH Command Line Utility automatically generates external table and locator files • Reads hive metadata to generate columns • Uses XML configuration files to drive process Hive table movieapp_log_stage movie_fact_ext_tab_hive d2.loc <uri_list> <uri_list_item size="2527870“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data1</uri_list_item> <uri_list_item size=“101000“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data2</uri_list_item> </uri_list> File System data2.loc data1.loc data3.loc Locator Files d1.loc d2.loc
Data files on Hadoop • prompt> hadoopfs –ls /user/oracle/moviework/data • prompt> hadoopfs –cat /user/oracle/moviework/data/part-00002
Part 1a: Reading Hive Tables withOracle SQL Connector for HDFS • cd /home/oracle/movie/moviework/osch • This directory contains the scripts genloc_moviefact_hive.sh, moviefact_hive.xml • Run the OSCH command-line utility to create external table • sh genloc_moviefact_hive.sh • (the password is: welcome1)
Part 1a • Contents of genloc_moviefact_hive.sh hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -conf /home/oracle/movie/moviework/osch/moviefact_hive.xml \ -createTable
Part 1 moviefact_hive.xml • Examine the Hadoop configuration properties • more moviefact_hive.xml
Part 1a • Query the table • sqlplusmoviework/oracle SQL> select count(*) from movie_fact_ext_tab_hive; SQL> select custid from movie_fact_ext_tab_hive where rownum < 10; SQL> select custid, title from movie_fact_ext_tab_hive p, movie q where p.movieid = q.movieid and rownum < 10;
Part 1b: Reading HDFS Files with Oracle SQL Connector for HDFS Hadoop Oracle Database • OSCH Command Line Utility automatically generates external table and locator files • All external table columns have type VARCHAR2 • Uses XML configurationfiles to drive process movie_fact_ext_tab_hive data2.loc d2.loc <uri_list> <uri_list_item size="2527870“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data1</uri_list_item> <uri_list_item size=“101000“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data2</uri_list_item> </uri_list> data1.loc File System data3.loc Locator Files d1.loc d2.loc
Part 1b: Reading HDFS Files withOracle SQL Connector for HDFS • cd /home/oracle/movie/moviework/osch • This directory contains the scripts genloc_moviefact_text.sh, moviefact_text.xml • Run the OSCH command line utility to create the external table • sh genloc_moviefact_text.sh • (the password is: welcome1)
Part 1b • The script genloc_moviefact_text.sh hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -conf /home/oracle/movie/moviework/osch/moviefact_text.xml \ -createTable
Part 1b moviefact_text.xml • Examine the Hadoop configuration properties • more moviefact_file.xml
Oracle Loader for Hadoop Connect to the database from reducer nodes, load into database partitions in parallel Partition, sort, and convert into Oracle data types on Hadoop Oracle Loader for Hadoop Shuffle/Sort Features Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats Automatic balancing in case of skew in input data Online and offline modes MAP Reduce MAP Reduce MAP MAP Shuffle/Sort Reduce MAP Reduce MAP Reduce
Installing Oracle Loader for Hadoop Hive Client OLH Target Table Oracle Database System Hadoop Cluster
Part 2: Loading Data with Oracle Loader for Hadoop Hadoop Oracle Database • Load in parallel into database table using OCI direct path • Loader map file maps input data to columns in database table • Uses XML configurationfiles to drive process data2.loc data1.loc movie_sessions_tab data3.loc
Part 2: Loading Data with Oracle Loader for Hadoop • Examine the data files on HDFS • hadoopfs -ls /user/oracle/moviedemo/session
Part 2: Oracle Loader for Hadoop • cd /home/oracle/movie/moviework/olh • This directory contains all the necessary scripts moviesession.sql, moviesession.xml, loaderMap_moviesession.xml, runolh_session.sh
Part 2 • Create the table data will be loaded into • sqlplusmoviedemo/welcome1 SQL> @moviesession.sql
Part 2 • Submit the Oracle Loader for HadoopMapReduce job • sh runolh_session.sh hadoop jar ${OLH_HOME}/jlib/oraloader.jar oracle.hadoop.loader.OraLoader-conf home/oracle/movie/moviework/olh/moviesession.xml
moviesession.xml Part 2 • Examine the Hadoop configuration properties • more moviesession.xml
loaderMap_moviesession.xml Part 2 • Examine the loaderMap file • more loaderMap_moviesession.xml
Versions • Certified Versions • Oracle Database 10.2.0.5, 11.2.0.2, 12c and higher • Hadoop distributions • CDH3, CDH4 (versions of Cloudera’s Distribution including Apache Hadoop) • Apache Hadoop 1.0.x, 1.1.1 • Should work with Hadoopdistros based on certified Apache Hadoop versions
Oracle Loader for Hadoop and Oracle SQL Connector for HDFS Oracle Data Pump files in HDFS queried (and loaded if necessary) with Oracle SQL Connector of HDFS. Offline load: Data pre-processed and written as Oracle Data Pump format in HDFS. Oracle SQL connector for hdfs Oracle Loader for Hadoop Shuffle/Sort MAP Reduce MAP SQL Query Reduce MAP External Table HDFS Client OSCH MAP Shuffle/Sort Reduce ODCH ODCH MAP Reduce Oracle Database MAP Reduce