520 likes | 536 Views
Explore the architectural model and operational goals of Data Warehousing, decision support tools, and modern examples. Learn about the six-layer DW architecture and the process of putting a DW in place.
E N D
DW: A Tactical View Developed by: Dr Eddie Ip Modified by: Dr Arif Ansari
Outline • Brief review • Evolution of DW • Architectural model of DW • Putting a DW in place
Review: DW • Snapshot: DW takes a snap-shot of operation & stores it away Jan 01 update update Dec 00 Nov 00 Current 1/18/01 snapshot Update
Review: DW • Read-only • For analytical processing • For company-wide usage (Rising Tide strategy)
Review: Operational goals of DW • Provide business users with access to data • Provide one version of the truth • Record the past accurately • Slice and dice efficiently thro’ data • Provide a unified view of customers
Evolution of DW as a decision support tool • Long history of analytic system • 1970 DSS = decision support systems • 1980 EIS = executive information systems • Reporting system for senior managers
EIS example • Frito-Lay
Pre-DW DSS • Lack DB component • Extract programs subset files (predetermined criteria) • Problems: reliable joins, synchronization, credibility, consistency • Process controlled by IT, not business users • Silos (one OLTP, one DSS)
Modern DSS Example • Stock Screen • Used by security analysts to inform & support business decision
Modern DSS Example • A simple web-based mutual fund screen • Charles Schwab http://www.schwab.com/SchwabNOW/navigation/mainFrameSet/0,4528,527,00.html
Java-based DSS • Java-based DSS (Comshare) • http://decisionweb.comshare.com/deciweb/tr/trmain2.asp?App=corporate%20review
DW as a DSS • OLTP bring data in • DSS draw information out Data Information to support decision DW
Basic Architectural model • An architecture is a set of rules or structures providing a framework for the overall design of a system. • Poe, Building a DW for Decision Support
Architectural Model • Data flow through the DW pipe of 6 layers • Transaction layer • Propagation layer • Database layer • Middleware layer • Application layer • Presentation layer
OD1 Six-layer DW Architecture AP1 OD2 AP2 OD3 AP3 OD4 AP4 OD5 Transaction Propagation Database Middleware Application Presentation
Transaction layer • Validate = is data up-to-date and accurate? ID invalid, missing, duplicate values • even name may have different format • First/Last, Last/Middle/First, …. • Define default for missing value • Map source to target = which field in operation data to which field in DW
Transaction layer • Data • Flat file (text or ASCII file) • RDBMS file • Proprietary file (.dat)
Transaction layer(Tips to handle data) • RDBMS files > SQL / extraction tools with GUI (often available in RDBMS) • One Big flat file • Large volume (e.g., click stream, call transaction) • Read into scalable RDBMS • Scripting or C/C++ program • Small volume • Statistical tools (SPSS, SAS, JMP, Splus) • Specialized tools such as spreadsheet, OLAP tools • Desktop RDBMS such as MS Access
Propagation layer • Most common option is asynchronous propagation (versus synchronous: changes occurring in transaction system synchronously transported to DW) • Transfer and load • Time for loading depends on a factors such as number of indexes
Database layer • DW is not just the database layer! • Integrated & transformed BEFORE • Analysis & presentation AFTER • Parallel processing capability often required • Symmetric Multi-processor(SMP) • Massively parallel processing (MPP) • Enhancement http://houseandhome.msn.com/pickaplace/nf_Overview.aspx
Middleware layer • Software that make access to database layer and application layer possible (= the pipes) • LAN, WAN • Client server model : logical and physical separation of data from the application • Thin client model : Internet/Intranet
Application layer • Decision support applications • Presentational systems (pre-defined reports) • Interrogative systems (SQL) • Simulation systems (What-if analysis) • Functional systems (e.g. segmentation analysis) • Automatic systems (e.g. credit approval expert system) • Constrained by organizational design • System geographical oriented? Product oriented?
Presentation layer • Interface • Command-line (e.g. unix) • Menu-driven (little interaction) • Query language (e.g. SQL) • Graphical (windows, pull-down menu, buttons, interactive) • Groupware (Lotus Notes, Window based products)
Presentation layer • Multmedia (sound ,video, animation) • Hypertext (html, xml)
Presentation layer • Security • Methods to enforce security • Views = seeing selected rows or columns • Access control = DBA Grant/Revoke user’s privilege’s access to data files • Admin control • Accesses based on roles (group of users) • Encryption • Firewall • Audit = track trail
Putting a DW in place • Plan and launch • Collect business requirements • Implementation • Maintenance, support, and evolution
Selecting technology Architecture Evolution Data staging Data Modeling Plan Collect Business Requirements Analytical Applications
Plan and launch • Sponsorship • ROI and risk • Expenses: hardware & network upgrade (25-60 %), software (10-25%), services (10-35%), internal staff (5-15%) • 3-6 m ~ 1-3 m • Figures from Humphrey, Hawkins, and Dy (1999) • Cooperation across departments (enterprise DW)
Plan>Implement • Business requirement collection • A DW aims at solving business, not technical problems. • Architecture & design • Infrastructure = technologies, platforms & components necessary to support the chosen architecture • Data • Analytical applications
Plan>Implement • Gather business requirements • Interview key end users • Executives • Managers • Analysts • Front-line workers
Implementation • Design • What to include • How are data formatted and stored • Physical and logical models • Meta-data management • Other implementation issues (e.g., data model) discussed later in course
Other Broad Issues • How do I measure results? (If you’re not measuring it, you’re not managing it) • Build or buy? • Data mart vs Enterprise DW • Ownership • What can be outsourced?
Types of DW • Plain-vanilla/ generic DW • Data mart = specialized DW built for a community of users (e.g., finance) • Enterprise DW = DW built for entire company • Operational data store (ODS) = data store to see instant snapshot, e.g, customer call center • (reference: Westerman, p.10)
Types of DW • Federated DW • Hybrid of Top down/Bottom up approach • Top down= built Enterprise DW • Bottom Up= Each unit/company built its own DW or data mart • FDW: set of common rules (captured as meta data) for each quasi independent DW http://www.dmreview.com/master.cfm?NavID=198&EdID=1953
Some Trends • An encompassing enterprise DW is a myth • Federated DW, data mart • Quick ROI (small victories) often useful to sustain DW • Big, fat, expensive apps that required long development time are replaced by smaller, more flexible, less expensive turnkey apps • Mid-size & smaller companies are beginning to build DW (Declining h/w cost + Microsoft factor)
The Modern Data Warehouse • A data warehouse is a copy of transaction data specifically structured for querying, analysis and reporting • Note that the data warehouse contains a copy of the transactions. These are not updated or changed later by the transaction system. • Also note that this data is specially structured, and may have been transformed when it was placed in the warehouse
Data Warehouse Roles and Structures The DW has the following primary functions: • It is a direct reflection of the business rules of the enterprise. • It is the collection point for strategic information. • It is the historical store of strategic information. • It is the source of information later delivered to data marts. • It is the source of stable data regardless of how the business processes may change.
Data Marts • A data mart is a smaller, more focused data warehouse. It reflects the business rules of a specific business unit. • The data mart does not need to cleanse its data because that was done when it went into the warehouse. • It is a set of tables for direct access by users. • These tables are designed for aggregation. • It typically is not a source for traditional statistical analysis.
Decision Support Information Data Mart Data Mart Data Mart Data Delivery Decision Support Information Decision Support Information Position of the Data Mart Within the Organization
What Can a Data Warehouse Do? Some of the benefits of a DW are: • Immediate information delivery • Data integration from across and even outside the organization • Future vision from historical trends • Tools for looking at data in new ways • Freedom from IS department resource limitations (you don’t need programmers to use a data warehouse)
What Does All This Mean? • On a daily basis, organizations turn to their data warehouses to answer a limitless variety of questions. • Nothing is free, however, and these benefits do come with a cost. • The value of a data warehouse is a result of the new and changed business processes it enables. • There are limitations, though. A DW cannot correct problems with the data, although it may help to clearly identify them.
The Cost of Warehousing Data • Expenditures can be categorized as one-time initial costs or as recurring, ongoing costs. • The initial costs can further be identified as for hardware or software. • Expenditures can also be categorized as capital costs (associated with acquisition of the warehouse) or as operational costs (associated with running and maintaining the warehouse)
Cost Are Highly Variable • A company that spends less money for their data warehouse is often happier with it. • The main justification for the development expense is that a DW reduces the cost of accessing the information owned by the organization. • Since information has to be retrieved just once (when it is placed in the warehouse), DW users see a lower cost on each report generated.
Source System A Source System B Source System C Source System D Typical Multidatabase Report and Screen Generation Data download and transformation contribute to retrieval costs for every report or screen generated
Source System A Organizational Data Warehouse Source System B Source System C Source System D Typical DW Report and Screen Generation Data upload and transformation costs occur just once. Retrieval costs are lower.
Summary • At a tactical level, DW can be viewed as a DSS • 6-layer architecture model of DW • Collecting business requirements is an important first step in building a DW • General DW proliferation : without the analytics, you’re at a competitive disadvantage