300 likes | 414 Views
Hadoop Your ETL: Using Big Data Technologies to Enhance Today’s Data Warehouses.
E N D
Hadoop Your ETL: Using Big Data Technologies to Enhance Today’s Data Warehouses
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Thoughts Things Processes Thoughts Things Processes
Today’s Challenges Produce Data More sources of data Use Data
Big Data Usage Pattern ETL and Batch Processing Workloads on Hadoop SQL DW & BI Analytics Web Data Factory • Scalable • Flexible • Cost Effective SQL NoSQL
Data Factory: Basic Use Cases • Offload mainframe batch processing to Hadoop, with lower cost and higher levels of performance • Offload ETL staging processing to Hadoop to decrease ETL costs, and enable more DW bandwidth • Create centralized repository of data to serve multiple applications and data warehouses
Oracle Big Data Solution Decide Oracle Database Cloudera Hadoop Oracle Advanced Analytics Oracle NoSQL Database Oracle BI Foundation Suite Endeca Information Discovery Oracle Big Data Connectors Oracle Spatial & Graph Oracle R Distribution Oracle Event Processing Oracle GoldenGate Apache Flume Oracle DataIntegrator Oracle Real-TimeDecisions Stream Acquire – Organize – Analyze
Big Data Appliance X3-2 Sun Oracle X3-2L Servers with per server: • 2 * 8 Core Intel Xeon E5 Processors • 64 GB Memory • 36TB Disk space Integrated Software: • Oracle Linux • Oracle Java JDK • Cloudera Distribution of Apache Hadoop (CDH) • Cloudera Manager • Oracle R Distribution • Oracle NoSQL Database All integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support for Operating Systems
Platform Strengths • Low-cost Scalability • Flexible Schema on Read • Abstract Storage Model • Open • Rapid Evolution • Extreme Performance • Highly Secure • Analytic SQL • Rich Tool Set • Vast Expertise Big Data Appliance+Hadoop Exadata+Oracle Database
ETL That Eats At the Bottom Line Long-running ETL jobs: Lots of resources Less Value Less horsepower for innovative analysis
Data Factory ETL Increases Savings One factory to be accessed at any time More resources for more insights
Save the Bottom Line, Serve Innovation Data Factory Big Data Appliance+Hadoop Exadata+Oracle Database
Customer Example: Mobile Telecom Provider Before Filter & Split Alerting Event Monitoring Telecom Services Complex Correlation Streaming ETL Data Warehouse Streaming ETL • Exponential growth in data, generated by new consumer devices • ETL and storage constraints limited analytics to 1% sample • Now combined Oracle Exadata and Cloudera Hadoop delivers analytics on 100% of data • Query times reduced dramatically (i.e. from 4 days to 53 minutes) • 90% reduction of ETL code base • From 1% sampling to 100% analysis Archive Storage After Alerting Filter & Split Event Monitoring Telecom Services Hadoop Archive Storage ETL Correlation Stage 1 DWH Data Warehouse
Benefits: Faster access to 6x more data Lower cost, simplified architecture Implemented in a matter of months Challenges: Reduce IT costs Comply with regulations requiring more data to support stress testing Consolidate and streamline data processing Customer Example: Full Service Bank Before After Mainframe Exadata Mainframe Big Data Appliance
Big Data Connectors and Data Integrator 15TB / hour 10x Faster Exadata+Oracle Database Big Data Appliance+Hadoop
Big Data Connectors Optimized integration of Hadoop with Oracle Database and Oracle Exadata • Oracle Loader for Hadoop • Oracle SQL Connector for Hadoop Distributed File System (HDFS) • Oracle Data Integrator Application Adapter for Hadoop • Oracle R Connector for Hadoop • Oracle XQuery Connector for Hadoop • Does not require Big Data Appliance – can be licensed for Hadoop running on non-Oracle hardware
Oracle Loader for Hadoop Oracle Loader for Hadoop MAP Last stage in MapReduce workflow Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats MAP Shuffle/Sort Reduce MAP Oracle Database Reduce MAP Reduce Shuffle/Sort MAP Reduce MAP Reduce
Oracle Loader for Hadoop:Connectivity to Hadoop Technologies Oracle Loader for Hadoop JSON SerDe JSON files Shuffle/Sort MAP Reduce MAP Reduce MAP Hive’s HBase Storage Handler Oracle Data Warehouse MAP Shuffle/Sort Reduce MAP Reduce Hive external tables MAP Reduce
Oracle SQL Connector for HDFS HDFS Oracle Database Use Oracle SQL to Load or Access Data on HDFS Features Load into the database using SQL Option to access and analyze data in place on HDFS Access Hive (internal and external) tables and HDFS filesAutomatic load balancing to maximize performance SQL Query OSCH External Table OSCH OSCH OSCH HDFS Client
XQuery Connect for Hadoop • XQuerylanguage executed on the Map/Reduce framework Map/Reduce XQuery Map/Reduce Worker Nodes Execution Plan OXH for $ln in M/R Engine text:collection() let $f := tokenize($ln) M/R where $f[1] = 'x' HDFS return M/R M/R text:put($f[2])
Supports Hadoop standards Reverse Engineer Hadoop metadata Check, Validate and Ensure Data Integrity with Hadoop Load Data into HDFS/Hive Generate HiveQL and execute in Hadoop Leverage existing Hadoop transformations Heterogeneous Integration with Hadoop Environments Oracle Data Integrator for Big Data Access Transform Oracle Data Integrator Loads
Oracle Data Integrator – Hive Control Knowledge Module Big Data Transformation Services Metadata Focused Approach Selectable Hive Transformations Easy to use & Guided Hive Function Support
Heterogeneous Integration with Hadoop Environments Oracle Data Integrator for Big Data
Thoughts Things Processes Thoughts Things Processes
Oracle Big Data Solution Decide Oracle Database Cloudera Hadoop Oracle Advanced Analytics Oracle NoSQL Database Oracle BI Foundation Suite Endeca Information Discovery Oracle Big Data Connectors Oracle Spatial & Graph Oracle R Distribution Oracle Event Processing Oracle GoldenGate Apache Flume Oracle DataIntegrator Oracle Real-TimeDecisions Stream Acquire – Organize – Analyze