350 likes | 464 Views
ITEC 3220A Using and Designing Database Systems. Instructor: Gordon Turpin Course Website: www.cse.yorku.ca/~gordon/itec3220S07 Office: CSEB3020. Chapter 13. The Data Warehouse. Transaction Processing Versus Decision Support.
E N D
ITEC 3220AUsing and Designing Database Systems Instructor: Gordon Turpin Course Website: www.cse.yorku.ca/~gordon/itec3220S07 Office: CSEB3020
Chapter 13 The Data Warehouse
Transaction Processing Versus Decision Support • Transaction processing allows organizations to conduct daily business in an efficient manner • Operational database • Decision support helps management provide medium-term and long-term direction for an organization
Operational vs. Decision Support Data • Operational data • Relational, normalized database • Optimized to support transactions • Real time updates • DSS • Snapshot of operational data • Summarized • Large amounts of data • Data analyst viewpoint • Timespan • Granularity • Dimensionality
The DSS Database Requirements • Database schema • Support complex (non-normalized) data • Extract multidimensional time slices • Data extraction and filtering • End-user analytical interface • Database size • Very large databases (VLDBs) • Contains redundant and duplicated data
Data Warehouse • Integrated • Centralized • Holds data retrieved from entire organization • Subject-Oriented • Optimized to give answers to diverse questions • Used by all functional areas • Time Variant • Flow of data through time • Projected data • Non-Volatile • Data never removed • Always growing
Data Marts • Single-subject data warehouse subset • Decision support to small group • Can be tested for exploring potential benefits of Data warehouses • Address local or departmental problems
Star Schema • Data-modeling technique • Maps multidimensional decision support into relational database • Yield model for multidimensional data analysis while preserving relational structure of operational DB • Four Components: • Facts • Dimensions • Attributes • Attribute hierarchies
Star Schema Representation • Facts and dimensions represented by physical tables in data warehouse DB • Fact table related to each dimension table (M:1) • Fact and dimension tables related by foreign keys • Subject to the primary/foreign key constraints
Example Canadian financial organization is interested in building a data warehouse to analyze customers’ credit payments over time, location where the payments were made, customers, and types of credit cards. A customer may use the credit card to make a payment in different locations across the country and abroad. If a payment is made abroad it can be based on domestic currency and then converted into Canadian dollars based on currency rate. • Time is described by Time_ID, day, month, quarter and year. • Location is presented by Location_ID, name of the organization billing the customer, city and country where the organization is located, domestic currency. • A credit card is described by credit card number, type of the credit account, and customer’s credit rate. The customer’s rate depends on the type of the credit account. • A customer is described by ID, name, address, and phone.
Performance-Improving Techniques for Star Schema • Normalization of dimensional tables • Multiple fact tables representing different aggregation levels • Denormalization of the fact tables • Table partitioning and replication
Normalization Example • Normalize the star schema that you developed for Canadian financial organization on page 16 into 3NF.
More Example A supermarket chain is interested in building a data warehouse to analyze the sales of different products in different supermarkets at different times using different payment method. • Each supermarket is presented by location_ID, city, country, and domestic currency. • Time can be measured in time_ID, day, month, quarter, and year. • Each product is described by product_ID, product_name, and vendor. • Payment method is described by payment_ID, payment_ type. Design a star schema for this problem and then normalize the star schema that you developed into 3NF.
Chapter 12 Distributed Database Management Systems
The Evolution of Distributed Database Management Systems • Distributed database management system (DDBMS) • Governs storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites
Single-Site Processing, Single-Site Data (SPSD) • All processing is done on single CPU or host computer (mainframe, midrange, or PC) • All data are stored on host computer’s local disk • Processing cannot be done on end user’s side of the system • Typical of most mainframe and midrange computer DBMSs • DBMS is located on the host computer, which is accessed by dumb terminals connected to it • Also typical of the first generation of single-user microcomputer databases
Multiple-Site Processing, Single-Site Data (MPSD) • Multiple processes run on different computers sharing a single data repository • MPSD scenario requires a network file server running conventional applications that are accessed through a LAN • Many multi-user accounting applications, running under a personal computer network, fit such a description
Multiple-Site Processing, Multiple-Site Data (MPMD) • Fully distributed database management system with support for multiple data processors and transaction processors at multiple sites • Classified as either homogeneous or heterogeneous • Homogeneous DDBMSs • Integrate only one type of centralized DBMS over a network
Multiple-Site Processing, Multiple-Site Data (MPMD) (Cont’d) • Heterogeneous DDBMSs • Integrate different types of centralized DBMSs over a network • Fully heterogeneous DDBMS • Support different DBMSs that may even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframes and microcomputers
Distributed Database Design • Data fragmentation: • How to partition the database into fragments • Data replication: • Which fragments to replicate • Data allocation: • Where to locate those fragments and replicas
Data Fragmentation • Breaks single object into two or more segments or fragments • Each fragment can be stored at any site over a computer network • Information about data fragmentation is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests
Data Fragmentation Strategies • Horizontal fragmentation: • Division of a relation into subsets (fragments) of tuples (rows) • Vertical fragmentation: • Division of a relation into attribute (column) subsets • Mixed fragmentation: • Combination of horizontal and vertical strategies
Data Replication • Storage of data copies at multiple sites served by a computer network • Fragment copies can be stored at several sites to serve specific information requirements • Can enhance data availability and response time • Can help to reduce communication and total query costs
Replication Scenarios • Fullyreplicated database: • Stores multiple copies of each database fragment at multiple sites • Can be impractical due to amount of overhead • Partially replicated database: • Stores multiple copies of some database fragments at multiple sites • Most DDBMSs are able to handle the partially replicated database well • Unreplicated database: • Stores each database fragment at a single site • No duplicate database fragments
Data Allocation • Deciding where to locate data • Allocation strategies: • Centralized data allocation • Entire database is stored at one site • Partitioned data allocation • Database is divided into several disjointed parts (fragments) and stored at several sites • Replicated data allocation • Copies of one or more database fragments are stored at several sites • Data distribution over a computer network is achieved through data partition, data replication, or a combination of both