440 likes | 686 Views
Experiences with Real-Time Data Warehousing Using Oracle Database 10G . Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation. Agenda.
E N D
Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation
Agenda • The meaning of Real-Time in Data Warehousing • Customer Business Scenario • Customer Environment • “Real-Time” Requirement • Our Real-Time Solution • Real-Time data architecture • Incremental Operational Source Change Capture • Transformation and Population into DW Target • Simplified Functional Demonstration • Asynchronous Change Data Capture (Oracle) • Performance Characteristics and Considerations Mike Schmitz High Performance Data Warehousing
My Background • An independent data warehousing consultant specializing in the dimensional approach to data warehouse / data mart design and implementation with in-depth experience utilizing efficient, scalable techniques whether dealing with large-scale data warehouses or small-scale, platform constrained data mart implementations. I deliver dimensional design and implementation as well as ETL workshops in the U.S. and Europe. • I have helped implement data warehouses using Redbrick, Oracle, Teradata, DB2, Informix, and SQL Server on mainframe, UNIX, and NT platforms, working with small and large businesses across a variety of industries including such customers as Hewlett Packard, American Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J.D. Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati, and the French Railroad SNCF. Mike Schmitz High Performance Data Warehousing
Real-Time in Data Warehousing • Data Warehousing Systems are complex environments • Business rules • Various data process flows and dependencies • Almost never pure Real-Time • Some latency is a given • What do you need? • Real Time • Near Real-Time • Just in Time for the business Mike Schmitz High Performance Data Warehousing
Customer Business Scenario • Client provides software solutions for utility companies • Utility companies have plants generating energy supply • Recommended maximum output capacity • Reserve Capacity • Buy supplemental energy as needed • Peak demand periods are somewhat predictable • Each day is pre-planned on historical behavior • Cheaper to buy energy ahead • Expensive to have unused capacity • Existing data warehouse supports the planning function • Reduced option expenses • Cut down of supplemental energy costs Mike Schmitz High Performance Data Warehousing
Customer “Real-Time” Requirement • Getting more in-time accuracy enhances operational business • Compare today's plant output volumes to yesterdays or last week’s average • Know when to purchase additional options or supplies • Customer Target • Actual data within a 5 minute lag • Use a single query • Use a single tool Mike Schmitz High Performance Data Warehousing
Sample Analysis Graph Mike Schmitz High Performance Data Warehousing
Our Real-Time SolutionOverview • Three-Step Approach: • Implement a real-time DW data architecture • Near real-time incremental change capture from operational system • Transformation and Propagation (population) of change data to DW Mike Schmitz High Performance Data Warehousing
Our Real-Time SolutionReal-Time DW Data Architecture • Add a Real-Time “Partition” to our Plant Output Fact Table for current day activity • Separate physical table • No indexes or RFI constraints (data coming in will have RFI enforced) during daily activity • UNION ALL viewed to the Plant Output Fact Table Mike Schmitz High Performance Data Warehousing
Our Real-Time SolutionChange Capture and Population • Incremental change capture from operational site • Synchronous or Asynchronous • Transformation and Propagation (population) of change data to the DW • Continuous trickle feed or periodic batch Synch CDC Staging Trigger DW Operations Asynch CDC Batch Mike Schmitz High Performance Data Warehousing
Our Real-Time SolutionIncremental Change Capture • Done with Oracle’s Change Data Capture (CDC) functionality • Synchronous CDC available with Oracle9i • Asynchronous CDC with Oracle10g • Asynchronous CDC is the preferred mechanism • Decoupling of change capture from the operational transaction Mike Schmitz High Performance Data Warehousing
DW Tables Logical Change Data Based on Log Miner Transform SQL, PL/SQL,Java Asynchronous CDC Oracle10g • SQL interface to change data • Publish/subscribe paradigm • Parallel access to log files, leveraging Oracle Streams • Parallel transformation of data Redo log files OLTP DB Mike Schmitz High Performance Data Warehousing
Our Real-Time SolutionPopulation of Change Data into DW • Continuous • Change table owner creates trigger to populate warehouse real-time partition • Periodic Batch • Utilize the Subscribe Interface • Subscribe to specific table and column changes through view • Sets a window and extracts the changes at required period • Purges view and moves window Mike Schmitz High Performance Data Warehousing
Our Real-Time SolutionThe Daily Process • Integrate daily changes into historical fact table • At the end of the day • index the current day table and apply constraints (no validate) • Create new fact table partition • Exchange current day table with new partition • Create next days “Real-Time Partition” table Mike Schmitz High Performance Data Warehousing
Simplified Functional DemoSchema Owners • AO_CDC_OP • Owns the operational schema • AO_CDC • Owns the CDC change sets and change tables (needs special cdc privileges) • ? CDC Publish Role • AO_CDC_DW • Owns the data warehouse schema (also needs special cdc privileges) • ? CDC Subscribe Role Mike Schmitz High Performance Data Warehousing
What do we have? • Operational transaction table • AO_CDC_OP.PLANT_OUTPUT • DW historical partitioned fact table • AO_CDC_DW.F_PLANT_OUTPUT • DW current day table (“Real-Time Partition”) • AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT • Data Warehouse UNION ALL view • AO_CDC_DW.V_PLANT_OUTPUT Mike Schmitz High Performance Data Warehousing
First • The CDC user publishes • Create a Change Set (CDC_DW) • Add supplemental logging for the operational table • Create a change table for the operational table (CT_PLANT_OUTPUT) • Force database logging on the tablespace to catch any bulk insert /*+ APPEND */ (non-logged) activity Mike Schmitz High Performance Data Warehousing
Next – Transform and Populate • One of two ways • Continuous Feed • Logged Insert activity • Permits nearer real-time • Constant system load • Periodic Batch Feed • Permits non-logged bulk operations • You set the lag time – how often do you run the batch process? • Hourly • Every five minutes • Less system load overall Mike Schmitz High Performance Data Warehousing
The Continuous Feed • Put an insert trigger on the change table which joins to the dimension tables picking up the dimension keys and does any necessary transformations Mike Schmitz High Performance Data Warehousing
The Batch Feed • The CDC schema owner • Authorizes AO_CDC_DW to select from the change table (the select will be accomplished via a generated view) • The DW schema owner • Subscribes to the change table and the columns he needs (with a centralized EDW approach this would usually be the whole change table) with a subscription and view name • Activates the subscription • Extract • Extend the window • Extracts changed data via the view (same code as trigger) • Purges the window (logical Delete – physical deletion is handled by the CDC schema owner) Mike Schmitz High Performance Data Warehousing
Extraction from Change Table View insert /*+ APPEND*/ into ao_cdc_dw.F_CURRENT_DAY_PLANT_OUTPUT (generating_plant_key, output_day_key, output_minute_key, output_actual_qty_in_kwh) select p.generating_plant_key ,d.output_day_key ,m.output_minute_key ,new.output_in_kwh from ao_cdc_dw.PO_ACTIVITY_VIEW new inner join ao_cdc_dw.d_generating_plant p on new.plant_id = p.plant_id inner join ao_cdc_dw.d_output_day d on trunc(new.output_ts) = d.output_day inner join ao_cdc_dw.d_output_minute m on to_number(substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),10,2)||substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),13,2)) = m.output_time_24hr_nbr; Mike Schmitz High Performance Data Warehousing
Next Step • Add the current days activity (the contents of the current day fact table) to the historical fact table as a new partition • Index and apply constraints to the current day fact table • Add a new empty partition to the fact table • Exchange the current day fact table with the partition • Create the new current day fact table Mike Schmitz High Performance Data Warehousing
Let’s step thru this live Mike Schmitz High Performance Data Warehousing
Summary • We created a real-time partition for current day activity • We put CDC on the operational table and created a change table populated by an asynchronous process (reads redo log) • We demonstrated continuous feed to the DW by using a trigger based approach • We demonstrated a batch DW feed by using the CDC subscribe process • We showed how to add the current day table to the fact table and set up the next days table • An electronic copy of the SQL used to build this prototype is available by emailing mike.schmitz@databaseperormance.com Mike Schmitz High Performance Data Warehousing
Michael BreyPrincipal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation
Overview • Benchmark Description • System Description • Database Parameters • Performance Data
The Benchmark • Customer OLTP benchmark run internally at Oracle • Insurance application handling customer inquires and quotes over the phone • N users perform M quotes • Quote = actual work performed during a call with a customer • Mixture of Inserts, Updates, Deletes, Singleton Selects, Cursor Fetches, Rollbacks/commits, savepoints • Compute average time for all quotes across users
System Info • SunFire 4800 • A standard Shared Memory Processor (SMP) • 8 900-Mhz CPUs • 16 GB physical memory • Solaris 5.8 • Database storage: striped across 8 Sun StorEdge T3 arrays (9X36.4MB each)
Database Parameters • Parallel_max_servers 20 • Streams_pool_size 400M (default 10% shared pool) • Shared_pool_size 600M • Buffer cache 128M • Redo buffers 4M • Processes 600
Tests • Conducted tests with Asynchronous Hotlog CDC enabled and disabled and with Sync CDC. • Asynchronous Hotlog CDC tests conducted at different log usage levels • Appr. 10, 50, and 100% of all OLTP tables with DML operations were included in CDC • Tests run with: • 250 concurrent users • Continuous peak workload after ramp-up • 175 transactions per second
Latency of Change Tracking • Latency is defined as the time between the actual change and its reflection in the Change Capture Table • Latency = time[change record insert] – time[redo log insert] • Latency measurement were made for the 100% Asynchronous Hotlog CDC run • 99.7% of records arrived in less than 2 secs • 53.5% of records arrived in less than 1 sec • Remaining records arrived in less than 3 sec • Asynchronous CDC kept up with the constant high OLTP workload all the time
Summary • Change Data Capture enables enterprise-ready near real-time capturing of change data • No fallback for constant high-load OLTP environments • Minimal impact on origin OLTP transactions • Predictable additional resource requirements, solely driven by the amount of change tracking • Oracle provides the flexibility to meet your “on-time” business needs
Q & A
11:00 AM #40153, Room 304 Oracle Warehouse Builder: New Oracle Database 10g Release 3:30 PM #40176, Room 303 Security and the Data Warehouse 4:00 PM #40166, Room 130 Oracle Database 10g SQL Model Clause Next Steps….Data Warehousing DB Sessions Monday Tuesday 8:30 AM #40125, Room 130 Oracle Database 10g: A Spatial VLDB Case Study 3:30 PM #40177, Room 303 Building a Terabyte Data Warehouse, Using Linux and RAC 5:00 PM #40043, Room 104 Data Pump in Oracle Database 10g: Foundation for Ultrahigh-Speed Data Movement For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html
Next Steps….Data Warehousing DB Sessions Thursday Business Intelligence and Data Warehousing Demos All Four Days In The Oracle Demo Campground 8:30 AM #40179, Room 304 Oracle Database 10gData Warehouse Backup and Recovery 11:00 AM #36782, Room 304 Experiences with Real-Time Data Warehousing using Oracle 10g 1:00PM #40150, Room 102 Turbocharge your Database, Using the Oracle Database 10g SQLAccess Advisor Oracle Database 10g Oracle OLAP Oracle Data Mining Oracle Warehouse Builder Oracle Application Server 10 For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html
Reminder – please complete the OracleWorld online session surveyThank you.