300 likes | 549 Views
Data Warehouses. Definition. Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization. Need for Data Warehousing.
E N D
Definition • Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.
Need forData Warehousing • Integrated, company-wide view of high-quality information. • Separation of operational and informational systems and data • operational system: a system that is used to run a business in real time, based on current data • informational system: systems designed to support decision making based on stable point-in-time or historical data
Factors AllowingData Warehousing • Relational DBMS. • Advances in hardware: speed and storage capacity. • End-user computing interfaces and tools.
Data Warehouse Architectures • Two-level • source system files containing operational data • transformed and integrated data warehouse • Three-level • Operational data. • Enterprise data warehouse (EDW)- single source of data for decision making. • Data marts - limited scope; data selected from EDW; customized decision-support for individual user groups
Reasons for theThree-Level Architecture • EDW and data marts have different purposes and data architectures. • Data transformation is complex and is best performed in two steps. • Data marts customized decision support for different groups • Architecture • Operational data, reconciled data, Derived data.
Data Characteristics • Status vs. Event data. • A transaction is a business activity that triggers one or more business events: event data captures them • Transient vs. Periodic data. • Transient: data in which changes to existing records are written over previous records, thus destroying previous data content • periodic data: data that are never physically altered or deleted once added
Reconciled DataCharacteristics • Detailed • Historical • Normalized • Enterprise-wide • Quality controlled
The Data Reconciliation Process • Capture: capture the relevant data from source files to fill EDW • Static - initial load. • Incremental - ongoing update. • Scrub or data cleansing • missing data, name reconciliation • Pattern recognition and other artificial intelligence techniques.
The Data Reconciliation Process • Transform • Convert the data format from the source to the target system. • Record-Level Functions • Selection. • Joining. • Aggregation (for data marts). • Field-Level Functions • Single-field transformation • Multi-field transformation
The Data Reconciliation Process • Load and Index • Refresh Mode • When the warehouse is first created. • Static data capture. • Update Mode • Ongoing update of the warehouse. • Incremental data capture.
Derived DataCharacteristics • Type of data • Detailed, possibly periodic. • Aggregated. • Distributed to departmental servers. • Implemented in star schema.
Star Schema • Also called the dimensional model. • Fact and dimension tables. • Fact table: consists of factual or quantitative data about the business • Dimension table: hold descriptive data • Grain of a fact table - time period for each record.
Size of the fact table • Total number of stores: 1,000 • Total number of products: 10,000 • Total number of periods: 24 • Total rows: 1000 * 10,000 * 24 = 240,000,000 • On average 50% items record sales, • no of rows = 120,000,000
Types of Data Marts • Dependent - Populated from the EDW. • Independent - Data taken directly from the operational databases.
The User Interface • The role of metadata. • Traditional query and reporting tools. • On-line analytical processing (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.
The User Interface • Slicing a cube. • Pivot • Rotate the view for a particular data point to obtain another perspective. • E.g. take a value from the units column and obtain by-store values. • Drill-down