250 likes | 330 Views
Data Warehouse. Definition. Data Warehouse: A Subject-oriented (eg customers, patients) integrated (consistent names, formats..) time-variant (time dimension so may be used for historical records) non-volatile (refreshed from live system, cannot be updated by end-users)
E N D
Definition • Data Warehouse: A • Subject-oriented (eg customers, patients) • integrated (consistent names, formats..) • time-variant (time dimension so may be used for historical records) • non-volatile (refreshed from live system, cannot be updated by end-users) • collection of data used in support of management decision making processes.
Need forData Warehousing • Separation of • operational (used by business in real time) and • informational systems (support decision making) and • data. • Data warehouse created for informational system.
Examples of heterogeneous data in operational system. Inconsistent key Synonyms Free-form fields Inconsistent data Missing data
Factors AllowingData Warehousing • Relational DBMS. • Advances in hardware: speed and storage capacity. • End-user computing interfaces and tools.
Generic (two-level) data warehouse architecture • Two-level • 1. Operational data. • 2. Enterprise data warehouse (EDW)- single source of data for decision making
Three-layer architecture • 1. Operational data. • 2. Enterprise data warehouse (EDW)- single source of data for decision making. • (reconciled data) • 3. Data marts - limited scope; data selected from EDW. • (derived data)
Reasons for theThree-Level Architecture • EDW and data marts have different purposes and data architectures. • Data transformation is complex and is best performed in two steps. • Data marts customized decision support for different groups.
Data Characteristics Example of DBMS log entry Status (before and after images) vs.Event data (database action resulting from the transaction).
Data Characteristics Transient operational data • Transient (changes written to the record eg. Phone number) vs.Periodic data (never changed. eg.accounting records). • Fig. 14-6,7.
Reconciled DataCharacteristics • Detailed • Historical • Normalized • Enterprise-wide • Quality controlled
The Data Reconciliation Process • Capture • Static - initial load. • Incremental - ongoing update. • Scrub or data cleansing • Pattern recognition and other artificial intelligence techniques.
The Data Reconciliation Process • Transform • Convert the data format from the source to the target system. • Record-Level Functions • Selection. • Joining. • Aggregation (for data marts). • Field-Level Functions • Single-field transformation, Fig. 14-9. • Multi-field transformation, Fig. 14-10.
The Data Reconciliation Process • Load and Index • Refresh Mode • When the warehouse is first created. • Static data capture. • Update Mode • Ongoing update of the warehouse. • Incremental data capture.
Derived DataCharacteristics (Data marts) • Type of data • Detailed, possibly periodic. • Aggregated. • Distributed to departmental servers. • Implemented in star schema.
Star Schema • Also called the dimensional model. • Fact and dimension tables. • Fig. 14-11,12, 13. (following) • Grain of a fact table - time period for each record. • Multiple Fact Table - Fig. 14-14. • Snowflake Schema - Fig. 14-15.
Types of Data Marts • Dependent - Populated from the EDW. • Independent - Data taken directly from the operational databases.
The User Interface • The role of metadata. • Traditional query and reporting tools. • On-line analytical processing. (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.
The User Interface • Slicing a cube. • Pivot • Rotate the view for a particular data point to obtain another perspective. • E.g. take a value from the units column and obtain by-store values. Slicing a data cube • Drill-down - Fig. 14-17.
The User Interface • Data Mining • Knowledge discovery. • Search for patterns in the data. • Table 14-3, 4. • Data Visualization