280 likes | 592 Views
Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database. Marty Gubar Oracle Big Data Product Management. Session Goals. Introduce the Oracle Big Data Connectors
E N D
Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database Marty Gubar Oracle Big Data Product Management
Session Goals • Introduce the Oracle Big Data Connectors • Understand how they provide high-performance connectivity between Oracle Database & Oracle Big Data Appliance • See the Connectors in action!
Oracle’s Big Data Platform Visualize & Decide Organize& Discover Stream Acquire Analyze
Oracle’s Big Data Platform Hadoop Oracle Database Oracle Big Data Connectors
Oracle Big Data Connectors Components • Oracle SQL Connector for HDFS • Oracle Loader for Hadoop • Oracle R Connector for Hadoop • Oracle Data Integrator Application Adapters for Hadoop
What is HDFS? Primary storage system underlying Hadoop Fault tolerant, scalable, highly available Designed to be well-suited to distributed processing Is superficially structured like a UNIX file system Big Data Appliance HDFS
What is Hive? Provides structure over files Metadata describes tables/columns HiveQL offers basic SQL access to data Hive converts HiveQL queries into MapReduce jobs Big Data Appliance HDFS CREATE EXTERNAL TABLE myTable ( movieId STRING, hits INT ) ROW FORMAT DELIMITED… SELECT movieId, sum(hits)FROM myTable GROUP BY movieId
Oracle SQL Connector for HDFS Access Hive tables and HDFS files using Oracle external tables Setup access automatically Combine data from two appliances Access or load data in parallel Hadoop Oracle Database SQL Query External Table OSCH ODCH ODCH
Performance Comparison • Fuse DFS Load speed comparison CPU usage comparison
Key Benefits • Uniquely enables access to HDFS data files from Oracle Database • Performance • 12 TB/hour from Oracle Big Data Appliance to Oracle Exadata • 5x – 20x faster than comparable third party products • Easy to use for Oracle DBAs and Hadoop developers • Developed and supported by Oracle
Oracle Loader for Hadoop Read target table metadata from the database Connect to the database from reducer nodes, load into database partitions in parallel (JDBC or direct path) Oracle Loader for Hadoop Partition, sort, and convert into Oracle data types on Hadoop Shuffle/Sort Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats Handles skew in input data to maximize performance Online and offline modes (offline: create Oracle Data Pump files on HDFS) MAP Reduce MAP Reduce MAP MAP Shuffle/Sort Reduce MAP Reduce MAP Reduce
Automatically Handle Input Data Skew • Distribute load evenly across reduce tasks • All reducers do approximately the same amount of work • Avoids slowdown because of unbalanced reducer loads • Maximizes performance • Data is sampled to determine optimal partitioning of map output keys • Load Balancing across Reducers
Performance Comparison Third party products Load speed comparison CPU usage comparison
Key Benefits • Load directly from HDFS, Hive tables, … into Oracle Database without intermediate staging files • Performance • 10x faster than comparable third party products • Offload database server processing to Hadoop • Minimizes impact on performance SLAs of production applications • Easy to use for Oracle DBAs and Hadoop developers • Developed and supported by Oracle
Leverage Both Connectors Oracle Data Pump files in HDFS queried (and loaded if necessary) with Oracle SQL Connector of HDFS. Offline load: Data pre-processed and written as Oracle Data Pump format in HDFS. Oracle SQL connector for hdfs Oracle Loader for Hadoop Shuffle/Sort MAP Reduce MAP SQL Query Reduce MAP External Table HDFS Client OSCH MAP Shuffle/Sort Reduce ODCH ODCH MAP Reduce Oracle Database MAP Reduce
For more information Search OTN for… • Big Data • Data Warehousing Blog • Oracle Big Data Interactive e-Book • Oracle Big Data YouTube videos