1 / 24

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis). Physical Data Warehouse Design. Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business. It’s all about trading storage for speed!. Fundamentals Aggregates (Ch. 16, pp. 356 - 357)

Download Presentation

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACCTG 6910Building Enterprise & Business Intelligence Systems(e.bis) Physical Data Warehouse Design Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

  2. It’s all about trading storage for speed! • Fundamentals • Aggregates (Ch. 16, pp. 356 - 357) • Indexes (Ch. 16, p. 357)

  3. Fundamentals: the Storage Hierarchy CPU 500-1000 MIPS Small Fast Cache 10-8 second 512 KB Access Speed Storage Capacity Memory 10-7 second 512 MB 512 GB Disk 10-2 second Large Slow

  4. Fundamentals: the Storage Hierarchy Cache CPU Bus Disk Drive (I/O Channel) Memory Disk How long does it take to query sales by city? How large is the Sales Fact table? How long does it take to access the Sales Fact table?

  5. Fundamentals How large is the fact table? e.g., 1 million records/day, 0.2KB/record  0.2 GB/day

  6. Fundamentals How long does it take to access all the fact records? E.g., the small fact table is 1 Terabyte in size! • 0.01s*1012=325 years LONG!!!!!!!!!!!!!

  7. Fundamentals: the Storage Hierarchy Cache CPU Bus Disk Drive (I/O Channel) Memory Disk • The logical unit of data transferred between disk and memory is block (e.g., 4k bytes)

  8. Fundamentals How long does it take to access all the fact records? E.g., the small fact table is 1 Terabyte in size! • Number of blocks: 2.5 millions • Access time = 0.01s*2500000= < 7 hrs!!!

  9. Aggregate • In data warehouse design, we choose the gain of fact table to be the possible lowest level. Grain: orderline

  10. Aggregate • The reasons to choose the lowest level of fact: • (X) Analysts want to query on single record • (O) Analysts want to flexibly cut and group records.

  11. Aggregate • However, keeping the most detailed fact records could result in • huge-size fact table: TeraBytes?! (1 million records/day, 256 Bytes/record -> 0.2 GB/day) • slow query

  12. Aggregate • To keep s data warehouse flexible, fact tables need to store facts in their lowest levels of detail. • To improve query performance, another type of fact table which stores pre-computed summaries of detailed facts helps. • Reduced to a logical design solution

  13. Aggregate • An aggregate fact table is a fact table that summarizes base-level fact table records along one or several dimensions. • An aggregate dimension table is a dimension table that summarizes base-level dimension table records. • E.g., marketing managers check daily product sales by city --- aggregate by city in customer dimension

  14. Aggregate Aggregate fact table Aggregate dimension table

  15. Aggregate

  16. Indexes How long does it take to find out the total purchase Amt by Tom Jones?

  17. Indexes • Customer table • 1M records, each record 0.200 Kbytes long • Block is 4K size, block access time is 0.01s • Number of records/block: 4/0.2=20 • Number of blocks: 1M/20=50K • Sequential search • Time: 25K*0.01s=250s=4min.

  18. Indexes • Binary search • Time: log(50K)*0.01s=16*0.01s=0.16s • B+ tree index • Create index pn on customer(cname); • If each node (block) in B+ tree has 117 keys, then • # of access to indexes: log117(1M)=3 (i.e.height of the tree) • # of access to Customer Dimension: 1 • Total time = 4*0.01 = 0.04s

  19. B+-trees - P=12 Indexes to indexes (11 key values, 12 pointers) ... ... ………. Indexes to customer records

  20. Indexes How long does it take to find out the total sales of Desktop computers?

  21. Performance Improvement • Suppose there are only 4 product categories for 1M products • Create a B+ tree index??? • Suppose the size of product category and block ID is 10 bytes • Size of index = 1M * 10 = 10 M bytes

  22. Performance Improvement • A bitmap index for an attribute A is a collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v for attribute A.

  23. Bitmaps Product record 1 record 2 record 3 A bitmap index for an attribute A is a collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v for attribute A.

  24. Performance Improvement • Bitmap index is suitable for low cardinality attribute. • Cardinality(A) = # of possible values for A/#of records • Compared with B+ tree index, bitmap index has the following advantages for low cardinality attributes • Storage space saving (1M*4/8=500K bytes) • Efficient for boolean operations • CREATE BITMAP INDEX bitpc ON PRODUCT (PCNAME);

More Related