360 likes | 691 Views
Oracle In-Database MapReduce : When Hadoop Meets Exadata. Kuassi Mensah Director Product Management.
E N D
Oracle In-Database MapReduce: When Hadoop Meets Exadata KuassiMensahDirector Product Management
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution
Big Data Concept MapReduce Infrastructure RDBMS MapReduce (phase I) DataMining (phase II) Any Data MapReduce Convention: Process Data Locally
Big Data In Real Life, Today MapReduce Infrastructure RDBMS Unstructured Data (HDFS, NoSQL, etc) MapReduce (phase I) DataMining (phase II) RDBMS Structured Data
Problems with Big Data Today • Shipping Data from RDBMS to MapReduce Infrastructure • Too Big to Move • Operational Issues • Data Correctness/Loss • Lack of Enterprise Class Security on MapReduce Infrastructure • Breaking MapReduce Convention • Cost of MapReduce Infrastructure or Storage • Lack of MapReduce Development Skills • Lack of MapReduce Deployment Skills
Big Data with In-Database MapReduce Hadoop Cluster RDBMS MapReduce DataMining Unstructured Data (HDFS, NoSQL, etc) In-Database MapReduce MapReduce Structured Data (RDBMS) DataMining
In-Database MapReduce Trends • Hybrid Platforms: DBMS + MapReduce • Projects/Products/Initiatives • DataStax: Cassandra + Hadoop • HadaptHadoopDB: Postgress + Hadoop • Greenplum HD • MongoDBMapReduce: JavaScript • Aster Data / TeraData • Limitations • Dependency on a Hadoop infrastructure in addition to DBMS • Source compatibility: Need to rewrite Hadoop jobs in different lang.
Oracle’s Big Data Strategy MapReduce APIs Across Data Infrastructure Hadoop, R, SQL RDBMS ( In-Database MapReduce) Big Data Appliance Weblogs Sales Records
Oracle In-Database MapReduceIntegration with Oracle Big Data Solution
Oracle In-Database MapReduce Feature of Oracle database 12c releases In-Database Container for Hadoop (currently Beta) SQL MapReduce (12.1.0.1)
Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution
SQL MapReduceDeclarative MR Analytics Collection of Existing and New Features • SQL Analytic functions • User-defined Aggregates functions • Parallel Pipelined Table Functions • SQL Pattern Matching MATCH_RECOGNIZE -- new!
SQL Pattern Matching Stock price Find 10-day periods where a stock price has “double-bottomed” Find event A (“privilege revoked”) followed by 3 or more occurences of event B (“attempted login”) within 1 minute • SQL Pattern Matching provides expressive syntax and fast execution for pattern matching • New SQL construct: MATCH_RECOGNIZE • Define patterns using regular expression syntax 9 12 1 days 19
SQL Pattern Matching Sessionization SELECT user_id, session_idstart_time, no_of_events, duration FROM Events MATCH_RECOGNIZE ( PARTITION BY User_ID ORDER BY Time_Stamp MEASURES match_number() session_id, count(*) as no_of_events, first(time_stamp) start_time, last(time_stamp) - first(time_stamp) duration PATTERN (b s*) DEFINE s as (s.Time_Stamp - prev(Time_Stamp) <= 10) ) ORDER BY user_id, session_id;
Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution
Vanilla Hadoop Hadoop Cluster Physical partitions (DataNodes) Mappers Materialization of Intermediate data Reducers
In-Database Container for Hadoop Components • Apache Hadoop • Task execution: In-Database JVM • Data partitioning & task scheduling: PQ engine • Data storage: Table, external table, object view. • Data type mapping: TableReader, TableWriter
In-Database Container For Hadoop RDBMS Server Table partitions Mappers processes Pipelining Intermediate data Reducers processes Parallel DML
In-DB Cont. 4 Hadoopvs Vanilla Hadoop RDBMS Server Hadoop Cluster Physical vs Logical data partitions Mappers Materialization vs Pipelining Intermediate data Reducers Parallel DML
In-Database Container for HadoopSummary • A “Hadoop container” in the RDBMS engine: no Hadoop cluster required. • Data processing in-situ: no need to ship data to a separate infrastructure. • API and Source-compatibility: accept HadoopMappers and Reducers as-is • Java interface: invoke Hadoop jobs a-la vanilla Hadoop • SQL interface: Map & Reduce steps in SQL statements
In-Database Container for Hadoop SQL and Java interfaces public class WordCount { public static void main() throws Exception { /* Setup the parameters and run the job */ …… job.init(); job.run(); } SELECT * FROM TABLE (HREDUCE_JP_WORDCOUNT(:ConfKey, CURSOR(SELECT * FROM TABLE (HMAP_JP_WORDCOUNT(:ConfKey, CURSOR(SELECT * from InTable))))))
Pipelining Hadoop Jobs Through the SQL Interface Pipelining Hadoop steps without intermediate materialization select * from table (HREDUCE_JP_JOB2 (:Confkey2, .... (HMAP_JP_JOB2 (:ConfKey2, .... (HREDUCE_JP_JOB1 (:ConfKey1, .... (HMAP_JP_JOB1 (:ConfKey1, ...), ))));
In-Database Container for Hadoop Projected Features • Reuse Mappers & Reducers (including R-generated) • Dynamic Data Partitioning • Apache Hadoop API 2.00 • Custom WritablesHadoop types • Serialized Data Formats • InputFormats: HDFS, HBase, Others • Java interface (Similar to Vanilla Hadoop Driver). • SQL interface: Hadoop Job Steps in SQL queries • Mahout
Develop/Deploy with In-Db Cont. 4 Hadoop Reuse existing Mappers & Reducers Develop HadoopMappers & Reducers from scratch Create or Update Hadoop Job Configuration file Load all Java code in RDBMS and create Call Specs Invoke Hadoop job via Java or SQL interfaces. Populate output table with parallel INSERT
Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution
Oracle’s Big Data Solution Oracle Endeca Information Discovery Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Oracle Real-TimeDecisions Acquire Organize Analyze Decide
Oracle In-Database MapReduce Summary • Declarative Analytics (SQL MapReduce) • Programmatic Analytics (Complex Algorithms, Hadoop) • MapReduce Jobs steps in SQL Queries. • Custom extensions (InputFormats) • RDBMS QoS (e.g., Enterprise Class Security) • Developers and DBAs friendly • Seamless integration with Oracle’s Big Data solution