290 likes | 303 Views
This article provides an overview of the functional components of a data warehouse, including ETL tools, enriched databases, data marts, metadata, and information delivery systems. It also discusses the architectural layers of a data warehouse and the administration and management tools required for its operation.
E N D
Data Warehouse (II): Its Components Developed by: Dr Eddie Ip Modified By : Dr Arif Ansari
Outline • Review • Functional view of DW: 7 components (tools & content) • Architectural view of DW: 6 layers
7 Functional Components • Extract, Transform, Load (ETL) tools • Enriched DW databases & DBMS tools • Meta data • Data marts • Access & Mining tools • DW administration & management tools • Information delivery system
ETL: Extract/Source data • data from OLTP’s • Access, extract • Transform • Cleaning , e.g. validate • Integrate with external files for enrichment, then load into DW • e.g. business database
ETL tools • Commercial tools: • Informatica • Information Builders (Copy Manager) • MS Data Transformation Services • Typical functions • Define source, query (run SQL), define transformation, define target, verify transformation, schedule run, audit report
Enrichment : example • Dun & Bradstreet : 45m world business • American Business Information
Enrichment :example • Match client’s file with proprietary DB • e.g. name, zip code • Append to matched record • sales volume • yellow page ad size • credit rating (in-house) • fax phone number
Enrichment :example • A well-known life-style segmentation product: PRISM • Contains 62 clusters • E.g., the affluent executives of "Blue Blood Estates" • remote rural families of "Blue Highways“ • Zip code analysis http://cluster2.claritas.com/YAWYL/ziplookup.wjsp http://houseandhome.msn.com/pickaplace/nf_Overview.aspx
Example: enrichment • ZIP code info may be useful to insurance agents • Credit rating info may be useful to bankers • Household info may be useful to direct marketing business
Data mart • Data mart = subset of DW for community users, e.g. accounting department • Sometimes exist as Multidimensional Database • Info mart = summarized data + report for community users
Metadata • Data about data • Field description, business rules (e.g. profit=? formula), log of file updates • Help users understand content & locate data
DW Database • Almost always a relational DB • Oracle, DB2, Sybase, SQL Server • New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing)
Data access & mining • Tools • Query & reporting • OLAP • Data mining, visualization, segmentation, clustering • New developments: text mining, web mining & personalization • Mining multimedia data
DW Administration • Security & priority • Keep track of updates • QC • Purging & copy to data mart
DW Administration • Security issue critical (users at many levels) • Some security measures to protect a DW • Views = limit users to see certain rows/columns • Access control = grant rights to specific users to access selected data (can be created by DBA thro’ SQL commands such as Grant/Revoke) • Admin controls such as group access, firewall, encryption • Audit = track what users are doing
Information Delivery Tools • Commercial tools • Crystal Report, Impromptu, WebFocus • Increasingly common mode of delivery: Web-enabled
Architectural Model • 6 layers: Data flow • Transaction layer • Propagation layer • Database layer • Middleware layer • Application layer • Presentation layer
Transaction layer • Programs are required to • Condition/scrub data = same naming convention, format e.g. MCI = MCI Worldcom=Worldcom
OD1 Six-layer DW Architecture AP1 OD2 AP2 OD3 AP3 OD4 AP4 OD5 Transaction Propagation Database Middleware Application Presentation
Transaction layer • Validate = is data up-to-date and accurate? ID invalid, missing, duplicate values • even name may have different format • First/Last, Last/Middle/First, …. • Define default for missing value • Map source to target = which field in operation data to which field in DW
Propagation layer • Most common option is asynchronous propagation • Transfer : transaction system to DW platform • Load : actual loading • Time for loading depends on a factors such as number of indexes
Database layer • DW is not just the database layer! • Integrated & transformed BEFORE • Analysis & presentation AFTER • Parallel processing capability often required • Symmetric Multi-processor(SMP) • Massively parallel processing (MPP)
Middleware layer • Software that make access to database layer and application layer possible • LAN, WAN • Client server model : logical and physical separation of data from the application
Application layer • Decision support applications • Presentational systems (pre-defined reports) • Interrogative systems (SQL) • Simulation systems (What-if analysis) • Functional systems (e.g. segmentation analysis) • Automatic systems (e.g. credit approval expert system) • Constrained by organizational design • System geographical oriented? Product oriented?
Presentation layer • Interface • Command-line (e.g. unix) • Menu-driven (little interaction) • Query language (e.g. SQL) • Graphical (windows, pull-down menu, buttons) • Groupware
Presentation layer • Multmedia (sound ,video, animation) • Hypertext (html)
OD1 Six-layer DW Architecture AP1 OD2 AP2 OD3 AP3 OD4 AP4 OD5 Transaction Propagation Database Middleware Application Presentation
Summary • ETL (extract, transform, load) tools • Data Mart • Metadata • Data access & Mining • DW administration • Information Delivery Tools • Architectural Model (6 Layers)