190 likes | 207 Views
Right In Time. Presented By: Maria Baron Written By: Rajesh Gadodia Intelligent Enterprise Feb 7, 2004 Vol. 7, Iss. 2; pg 26. Traditional Data Warehouse. Central repository of transactional data spread across heterogeneous platforms and applications
E N D
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia Intelligent Enterprise Feb 7, 2004 Vol. 7, Iss. 2; pg 26
Traditional Data Warehouse • Central repository of transactional data spread across heterogeneous platforms and applications • Focused on strategic reporting and analysis • Loaded periodically (nightly, weekly, monthly) • Information latency
Evolution of The Data Warehouse • First-generation • Reporting • Second-generation • Analytic processing and data mining • Multidimensional tools for drill down • New generation • Speed information cycle time • Minimize latency • Information on demand
Why Real Time Data Warehousing? • Active decision support • Business activity monitoring (BAM) • Alerting • Efficiently execute business strategy • Monitoring is completed in the background • Positions information for use by downstream applications • Can be built on top of existing data warehouse
Traditional Vs. Real-Time Data Warehouse • Traditional Data Warehouse (EDW) • Strategic • Passive • Historical trends • Batch • Offline analysis • Isolated • Not interactive • Best effort • Guarantees neither availability nor performance
Traditional Vs. Real-Time Data Warehouse • Real-Time Data Warehouse (RTDW) • Tactical • Focuses on execution of strategy • Real-Time • Information on Demand • Most up-to-date view of the business • Integrated • Integrates data warehousing with business processes • Guaranteed • Guarantees both availability and performance
Real-Time Integration • Goal of real-time data extraction, transformation and loading • Keep warehouse refreshed • Minimal delay • Issues • How does the system identify what data has been added or changed since the last extract • Performance impact of extracts on the source system
Techniques for real-time ETL • Simulated real-time feed • Increase the frequency of batch runs • Most useful when information is not required to be ‘up to the minute’ • Requires minimal changes to existing ETL infrastructure • Easy to implement
Techniques for real-time ETL • Trickle Feed • Allows continuous update of the RTDW as the data in the source system changes • Messaging infrastructure • Perpetually open data pipe • Also called streaming • Basic elements – Capture, Stage and Apply
Techniques for real-time ETL • Trickle feed (cont.) • Target and source databases must be configured • May require special gateways • Source – capture process: automatically capture changes to data or table structure • RTDW records changes as logical change records (LCRs) that are kept in a staging partition called the message queue • The message queue can be explicitly updated by user applications
Techniques for real-time ETL • Trickle feed Role of Target database • A process takes the logical change records out of the message queue and applies changes to selected database objects • Rules are set in message queues to handle data transformation • Require upfront development and can be complex to configure and manage
Information Delivery • Changes to traditional data warehouse • Need to accommodate continuous data trickle feeds intermixed with liver user queries • Schema design • Active partition management • Data aggregation
Designing an RTDW - Options • Trickle And Flip • Copy of fact table is made and given a name that cannot be accessed by queries • As new data trickles in, it is appended to copy of the fact table • At certain intervals, the trickle is halted, the copy fact table is copied, renamed to the active fact table name, (the active fact table is deleted) and the process starts over • Poses scalability problems – may not keep up with the trickle depending on the size of the table
Designing an RTDW - Options • Table Partitioning • Allows for the creation of large tables that are handled internally by the database as a series of smaller ones, each with its own indexes • Can rope off partition so it isn’t visible to active queries • Problem: Determining criteria for partitioning
Designing an RTDW - Options • Real-Time partitions • Create new tables that resemble active fact tables that are designed for quick updates • Interval tables – contain data from only the last update • Truly real-time • Can be accessed by analysts and other BI tools
Conclusion • RTDWs have an a distinct advantage for those business utilizing time-sensitive data • Call Centers • Performance indicators • Fraud detection • Yield management • Certain financial transactions