1 / 19

Right In Time

Right In Time. Presented By: Maria Baron Written By: Rajesh Gadodia Intelligent Enterprise Feb 7, 2004 Vol. 7, Iss. 2; pg 26. Traditional Data Warehouse. Central repository of transactional data spread across heterogeneous platforms and applications

Download Presentation

Right In Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia Intelligent Enterprise Feb 7, 2004 Vol. 7, Iss. 2; pg 26

  2. Traditional Data Warehouse • Central repository of transactional data spread across heterogeneous platforms and applications • Focused on strategic reporting and analysis • Loaded periodically (nightly, weekly, monthly) • Information latency

  3. Evolution of The Data Warehouse • First-generation • Reporting • Second-generation • Analytic processing and data mining • Multidimensional tools for drill down • New generation • Speed information cycle time • Minimize latency • Information on demand

  4. Why Real Time Data Warehousing? • Active decision support • Business activity monitoring (BAM) • Alerting • Efficiently execute business strategy • Monitoring is completed in the background • Positions information for use by downstream applications • Can be built on top of existing data warehouse

  5. Traditional Vs. Real-Time Data Warehouse • Traditional Data Warehouse (EDW) • Strategic • Passive • Historical trends • Batch • Offline analysis • Isolated • Not interactive • Best effort • Guarantees neither availability nor performance

  6. Traditional Vs. Real-Time Data Warehouse • Real-Time Data Warehouse (RTDW) • Tactical • Focuses on execution of strategy • Real-Time • Information on Demand • Most up-to-date view of the business • Integrated • Integrates data warehousing with business processes • Guaranteed • Guarantees both availability and performance

  7. Real-Time Integration • Goal of real-time data extraction, transformation and loading • Keep warehouse refreshed • Minimal delay • Issues • How does the system identify what data has been added or changed since the last extract • Performance impact of extracts on the source system

  8. Real-Time Data Warehouse – Logical Architecture

  9. Techniques for real-time ETL • Simulated real-time feed • Increase the frequency of batch runs • Most useful when information is not required to be ‘up to the minute’ • Requires minimal changes to existing ETL infrastructure • Easy to implement

  10. Techniques for real-time ETL • Trickle Feed • Allows continuous update of the RTDW as the data in the source system changes • Messaging infrastructure • Perpetually open data pipe • Also called streaming • Basic elements – Capture, Stage and Apply

  11. Techniques for real-time ETL • Trickle feed (cont.) • Target and source databases must be configured • May require special gateways • Source – capture process: automatically capture changes to data or table structure • RTDW records changes as logical change records (LCRs) that are kept in a staging partition called the message queue • The message queue can be explicitly updated by user applications

  12. Techniques for real-time ETL • Trickle feed Role of Target database • A process takes the logical change records out of the message queue and applies changes to selected database objects • Rules are set in message queues to handle data transformation • Require upfront development and can be complex to configure and manage

  13. Trickle Feed Architecture for Real-Time load

  14. Information Delivery • Changes to traditional data warehouse • Need to accommodate continuous data trickle feeds intermixed with liver user queries • Schema design • Active partition management • Data aggregation

  15. Designing an RTDW - Options • Trickle And Flip • Copy of fact table is made and given a name that cannot be accessed by queries • As new data trickles in, it is appended to copy of the fact table • At certain intervals, the trickle is halted, the copy fact table is copied, renamed to the active fact table name, (the active fact table is deleted) and the process starts over • Poses scalability problems – may not keep up with the trickle depending on the size of the table

  16. Designing an RTDW - Options • Table Partitioning • Allows for the creation of large tables that are handled internally by the database as a series of smaller ones, each with its own indexes • Can rope off partition so it isn’t visible to active queries • Problem: Determining criteria for partitioning

  17. Designing an RTDW - Options • Real-Time partitions • Create new tables that resemble active fact tables that are designed for quick updates • Interval tables – contain data from only the last update • Truly real-time • Can be accessed by analysts and other BI tools

  18. Real-Time Partition

  19. Conclusion • RTDWs have an a distinct advantage for those business utilizing time-sensitive data • Call Centers • Performance indicators • Fraud detection • Yield management • Certain financial transactions

More Related