1 / 19

Chap8: Trends in DBMS

Chap8: Trends in DBMS. 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary. Learning Objectives. Learning Objectives (LO) LO1: Learn about field data LO2 : Learn about storage and retrieval of field data

faraji
Download Presentation

Chap8: Trends in DBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary

  2. Learning Objectives • Learning Objectives (LO) • LO1: Learn about field data • LO2 : Learn about storage and retrieval of field data • LO3: Learn about spatial data warehouses • What are data warehouses? Why are they interesting? • What are aggregate functions? Which ones are easy to compute? • Mapping Sections to learning objectives • LO1 - 8.1 • LO2 - 8.1.2, 8.2 • LO3 - 8.3

  3. 8.3 Why are Data Warehouses Interesting? • Data Warehouse facilitate group decision making • Consider a dataset • 1 measure (i.e. Sales) • 3 dimensions (e.g. Company, Year, Region) • Analysis questions • Q1. Rank Regions by total sales. • Q2. Rank years by total sales. • Q3. Where are sales consistently growing? • Cross tabulates summaries reports used to analyze the trends • Example:

  4. 8.3 Generating cross-tabulation summaries • TraditionalApproach • Use custom software pulling data out of a DBMS • Limitations: redundant of work, inefficient use of resources • Data Warehouse approach • Cross-tab. Can be generated using a set of simple report • Each report is generated from a SQL “Select ... group by” statement • Example: Fig. 8.19 (pp. 244) and Table 8.3 (pp. 245) • Cross-tab example in last slide is a union of • SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2 • Table 8.3 shows SQL queries to compute each part • Advantage • Rest of SQL is available for pre/post processing of data • Performance gains by eliminating unnecessary copying of data

  5. Example Data Warehouse (Fig. 8.19) Fig 8.19

  6. 8.3.4 Cross-tabulation vs.report hierarchy • Spreadsheet view of a report • Views a report a N-dim. Spreadsheet • N = number of dimension attributes • Each cell contains value of “measure” • Cross-tabulation view of a Report hierarchy • Example: report hierarchy for • SALES-L0-A, SALES-L2-A, SALES-L1-B, SALES-L2, Fig. 8.19 (pp. 244)

  7. 8.3 What is a Data Warehouse? • Data Warehouse is a special purpose database • Primarily used for specialized data analysis purposes • Facilitates generation and navigation of a hierarchy of reports • Special purpose data-sets and queries • Data consists of • a few measure attributes • a set of dimension attributes • The measure attribute depends on dimension attributes • Queries generate reports • Report measure for selected values of dimensions • Aggregate measure for given subset of dimensions • What is a spatial data warehouse? • Data warehouses with spatial measures or dimensions • Example: census data - census tract is a spatial dimension • Example: logistics data - route is a spatial dimension

  8. 8.3.4 Data Warehouse Operations • Operations on a data warehouse • Roll-up, Drill-down • Slice, Dice • Pivot • Roll-up • Inputs: A report R, A subset S of dimensions in R • Output: A sequence of reports summarizing R • Example 1: R = SALES-Base, S = (Year, Region) in Fig. 8.19 (pp. 244) • Output consists of reports SALES-L0-A, SALES-L1-B, SALES-L2 • Example 2: R = SALES-Base, S = (Region, Year) • Output consists of reports SALES-L0-A, SALES-L1-A, SALES-L2 • Drill-down • Inputs: A report R, A dimension D not in R • Output: A reports detailing R on D • Example: R = SALES-L1-B, D = Region in Fig. 8.19 (pp. 244) • Output : report SALES-L0-A

  9. 8.3.4 Data Warehouse Operations • Slice, Dice • Reduce dimensions in a table- (Fig. 8.7, pp 232). • Inputs: A report R, A value V for a dimension D in R • Output: A subset of R where D =V • Example: R = SALES-L0-A, D = Year, V = 1994 in Fig. 8.19 (pp. 244) • Output: Table 8.5 (pp. 246) • includes tuple (ALL, 1994, America, 35) Fig 8.7

  10. 8.3.4 Data Warehouse Operations • Pivot • For a spreadsheet view of reports • Transposes a spreadsheet • Example • Inputs: A spreadsheet view of a report R • Output: A transposed spreadsheet • Ex.: R= SALES-L0-A, Fig. 8.19 (pp. 244)

  11. Logical Data Model of a DWH • Purpose of a logical data model • Specify a framework to specify computational structure • Allow extension of SQL to model new needs • Cube operation • Input : A fact table • Output: A set of summary reports covering all subsets of dimension columns • Equivalent to union of all tables and reports in Fig. 8.19 (pp. 244) • Ex. Fig. 8.18, pp. 243 • SELECT Company, Year, Region, Sum(Sales) AS Sales • FROM SALES • GROUP BY CUBE Company, Year, Region

  12. Physical Data Model of a DWH • Purpose: Computationally efficient implementation • Ideas: • Pre-computation - • pre-compute some of reports and use those to compute other reports • New indexing methods, e.g. bit-map index • Query Processing Strategies • Strategies for aggregate functions • New strategies for multi-table joins • Let us look at strategies for aggregate functions

  13. DWH Physical Model: Aggregate function strategies • Aggregate Functions • Compute summary statistics for a given set of values • Examples: sum, average, centroid (Table 8.1, pp. 238) • Strategies for efficient computation • Characterize easy to compute aggregate functions • 3 categories • Distributive • Algebraic • Holistic • First 2 categories can be computed easily in one scan of the dataset

  14. Definitions of Aggregate Function Categories • Notation: • F, G, G1, G2, … Gn are aggregate functions where n is small • S is a set of values, e.g. S = (1, 2, 3, 4) • P = (S1, S2, …, Sp) is a partition of S, e.g. P = (S1, S2), S1 = (1, 2), S2 = (3, 4) • Distributive( F ) if there exists a G such that • F( S ) = G ( F(S1), F(S2), …, F(Sn) ) • Example: sum is distributive • Illustration: sum(1, 2, 3, 4) = sum ( sum(1, 2), sum(3, 4) • Algebraic( F ) if there exists G1, …, Gn, (where n is small) and • F( S ) = G ( G1(S1), …, Gn(S1), G2(S1), …, Gn(S2), …, G1(Sp), …, Gn(Sp) ) • Example: average is distributive • Illustration: average(1, 2, 3, 4) • = { count(1, 2) * average(1, 2) + count(3, 4) * average } / { count(1,2) + count(3,4) }

  15. Example: Distributive Aggregate Function • Examples in cross-tabulation scenario (Fig. 8.14, pp.238): • Example 1. Min is distributive • Example 2. Count is distributive Fig 8.14

  16. Examples: Algebraic Aggregate Functions • Examples in cross-tabulation scenario (Fig. 8.15, pp.239): • Average and Variance are algebraic Fig 8.15

  17. Discussion - Spatial Data Warehouse • Example • Consider the example in Fig. 8.16, pp. 241 • A map interpretation may be attached to each report • Each row has a spatial footprint, which can be aggregated by geometric-union • The collection of maps may be called a mapcube • Issues: • What is needed in OGIS standard to support map-cube operation? • Hierarchical collection of maps in mapcube • What is an appropriate cartography to convey the relationship among maps?

  18. Spatial Data Warehouses and Mapcube Fig 8.16

  19. Summary • Field data • useful in many applications due to rich content • Represented as raster or image • Operations can be categorized into local, focal, zonal, and global • Field data storage and retrieval • Tiling is a preferred way to divide raster data into disk blocks • Meta-data based query is often used for retrieval • Content based retrieval may be used for similarity searches • Data warehouses support analysis e.g. cross-tabulation reports • SQL CUBE operator support generation of DWH reports • Distributive and Algebraic aggregate functions can be computed easily

More Related