Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database

Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database Marty Gubar Oracle Big Data Product Management

Session Goals • Introduce the Oracle Big Data Connectors • Understand how they provide high-performance connectivity between Oracle Database & Oracle Big Data Appliance • See the Connectors in action!

Oracle’s Big Data Platform Visualize & Decide Organize& Discover Stream Acquire Analyze

Oracle’s Big Data Platform Hadoop Oracle Database Oracle Big Data Connectors

Oracle Big Data Connectors Components • Oracle SQL Connector for HDFS • Oracle Loader for Hadoop • Oracle R Connector for Hadoop • Oracle Data Integrator Application Adapters for Hadoop

What is HDFS? Primary storage system underlying Hadoop Fault tolerant, scalable, highly available Designed to be well-suited to distributed processing Is superficially structured like a UNIX file system Big Data Appliance HDFS

What is Hive? Provides structure over files Metadata describes tables/columns HiveQL offers basic SQL access to data Hive converts HiveQL queries into MapReduce jobs Big Data Appliance HDFS CREATE EXTERNAL TABLE myTable ( movieId STRING, hits INT ) ROW FORMAT DELIMITED… SELECT movieId, sum(hits)FROM myTable GROUP BY movieId

Oracle SQL Connector for HDFS Access Hive tables and HDFS files using Oracle external tables Setup access automatically Combine data from two appliances Access or load data in parallel Hadoop Oracle Database SQL Query External Table OSCH ODCH ODCH

Performance Comparison • Fuse DFS Load speed comparison CPU usage comparison

Key Benefits • Uniquely enables access to HDFS data files from Oracle Database • Performance • 12 TB/hour from Oracle Big Data Appliance to Oracle Exadata • 5x – 20x faster than comparable third party products • Easy to use for Oracle DBAs and Hadoop developers • Developed and supported by Oracle

Demonstration:Using Oracle SQL Connector for Hadoop

Oracle Loader for Hadoop Read target table metadata from the database Connect to the database from reducer nodes, load into database partitions in parallel (JDBC or direct path) Oracle Loader for Hadoop Partition, sort, and convert into Oracle data types on Hadoop Shuffle/Sort Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats Handles skew in input data to maximize performance Online and offline modes (offline: create Oracle Data Pump files on HDFS) MAP Reduce MAP Reduce MAP MAP Shuffle/Sort Reduce MAP Reduce MAP Reduce

Automatically Handle Input Data Skew • Distribute load evenly across reduce tasks • All reducers do approximately the same amount of work • Avoids slowdown because of unbalanced reducer loads • Maximizes performance • Data is sampled to determine optimal partitioning of map output keys • Load Balancing across Reducers

Performance Comparison Third party products Load speed comparison CPU usage comparison

Key Benefits • Load directly from HDFS, Hive tables, … into Oracle Database without intermediate staging files • Performance • 10x faster than comparable third party products • Offload database server processing to Hadoop • Minimizes impact on performance SLAs of production applications • Easy to use for Oracle DBAs and Hadoop developers • Developed and supported by Oracle

Leverage Both Connectors Oracle Data Pump files in HDFS queried (and loaded if necessary) with Oracle SQL Connector of HDFS. Offline load: Data pre-processed and written as Oracle Data Pump format in HDFS. Oracle SQL connector for hdfs Oracle Loader for Hadoop Shuffle/Sort MAP Reduce MAP SQL Query Reduce MAP External Table HDFS Client OSCH MAP Shuffle/Sort Reduce ODCH ODCH MAP Reduce Oracle Database MAP Reduce

Demonstration:Using Oracle Loader for Hadoop

For more information Search OTN for… • Big Data • Data Warehousing Blog • Oracle Big Data Interactive e-Book • Oracle Big Data YouTube videos

Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database

Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database

Presentation Transcript

EMC Backup and Recovery for Oracle Database 11 g

Introducing Oracle Data Integrator and Oracle GoldenGate

An Exploration of Oracle Database 12c Key Feature Sets

Oracle NoSQL Database and Big Data

Oracle Database Security

Oracle Clinical Overview/Hands-On

Oracle and XML

Database Architecture Overview

The Best Way…

Basic Oracle Architecture

Turbocharge your Database: Use the Oracle Database 10 g SQLAccess Advisor

Oracle Database 12c Release 1 (12.1.0.2 )

ORACLE DATABASE

Oracle 10g 管理及应用

Wresting control of your Oracle data with Heat Map and ILM in Oracle DB 12c

Database Architecture Overview

Deploying Oracle Names

What’s New in Oracle Database 12c Graph Database