1 / 25

Data Warehouse

Data Warehouse. Definition. Data Warehouse: A Subject-oriented (eg customers, patients) integrated (consistent names, formats..) time-variant (time dimension so may be used for historical records) non-volatile (refreshed from live system, cannot be updated by end-users)

Download Presentation

Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse

  2. Definition • Data Warehouse: A • Subject-oriented (eg customers, patients) • integrated (consistent names, formats..) • time-variant (time dimension so may be used for historical records) • non-volatile (refreshed from live system, cannot be updated by end-users) • collection of data used in support of management decision making processes.

  3. Need forData Warehousing • Separation of • operational (used by business in real time) and • informational systems (support decision making) and • data. • Data warehouse created for informational system.

  4. Examples of heterogeneous data in operational system. Inconsistent key Synonyms Free-form fields Inconsistent data Missing data

  5. Factors AllowingData Warehousing • Relational DBMS. • Advances in hardware: speed and storage capacity. • End-user computing interfaces and tools.

  6. Generic (two-level) data warehouse architecture • Two-level • 1. Operational data. • 2. Enterprise data warehouse (EDW)- single source of data for decision making

  7. Three-layer architecture • 1. Operational data. • 2. Enterprise data warehouse (EDW)- single source of data for decision making. • (reconciled data) • 3. Data marts - limited scope; data selected from EDW. • (derived data)

  8. Reasons for theThree-Level Architecture • EDW and data marts have different purposes and data architectures. • Data transformation is complex and is best performed in two steps. • Data marts customized decision support for different groups.

  9. Data Characteristics Example of DBMS log entry Status (before and after images) vs.Event data (database action resulting from the transaction).

  10. Data Characteristics Transient operational data • Transient (changes written to the record eg. Phone number) vs.Periodic data (never changed. eg.accounting records). • Fig. 14-6,7.

  11. Reconciled DataCharacteristics • Detailed • Historical • Normalized • Enterprise-wide • Quality controlled

  12. The Data Reconciliation Process • Capture • Static - initial load. • Incremental - ongoing update. • Scrub or data cleansing • Pattern recognition and other artificial intelligence techniques.

  13. The Data Reconciliation Process • Transform • Convert the data format from the source to the target system. • Record-Level Functions • Selection. • Joining. • Aggregation (for data marts). • Field-Level Functions • Single-field transformation, Fig. 14-9. • Multi-field transformation, Fig. 14-10.

  14. The Data Reconciliation Process • Load and Index • Refresh Mode • When the warehouse is first created. • Static data capture. • Update Mode • Ongoing update of the warehouse. • Incremental data capture.

  15. Derived DataCharacteristics (Data marts) • Type of data • Detailed, possibly periodic. • Aggregated. • Distributed to departmental servers. • Implemented in star schema.

  16. Star Schema • Also called the dimensional model. • Fact and dimension tables. • Fig. 14-11,12, 13. (following) • Grain of a fact table - time period for each record. • Multiple Fact Table - Fig. 14-14. • Snowflake Schema - Fig. 14-15.

  17. Components of a star schema

  18. Star schema example

  19. Star schema with sample data

  20. Star schema with two fact tables

  21. Example of snowflake sample

  22. Types of Data Marts • Dependent - Populated from the EDW. • Independent - Data taken directly from the operational databases.

  23. The User Interface • The role of metadata. • Traditional query and reporting tools. • On-line analytical processing. (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.

  24. The User Interface • Slicing a cube. • Pivot • Rotate the view for a particular data point to obtain another perspective. • E.g. take a value from the units column and obtain by-store values. Slicing a data cube • Drill-down - Fig. 14-17.

  25. The User Interface • Data Mining • Knowledge discovery. • Search for patterns in the data. • Table 14-3, 4. • Data Visualization

More Related