210 likes | 418 Views
Storage Considerations for the Physical Model. Database Sizing. Sizing influences capacity planning and systems environment management. Sizing is required for: The database Other storage areas Indexes Sizing is not a science. Techniques vary. DWM suggests when to perform sizing.
E N D
Database Sizing • Sizing influences capacity planningand systems environment management. • Sizing is required for: • The database • Other storage areas • Indexes • Sizing is not a science. • Techniques vary. • DWM suggests when to perform sizing.
Estimating the Database Size 1. Estimate the size of each row in the fact table. 2. Determine the grain of each dimension and estimate the number of entries in the finest level. 3. Multiply the number of rows of all dimensions and multiply the result by the fact table row size. 4. Determine whether the fact table is sparse or dense and estimate the reduction or increase in size.
Validating Database Size Assumptions • After you estimate the size of the database, you can • validate your assumptions by doing the following: • Extract sample files • Load data into the database • Compute exact expected row lengths • Add overhead for indexing, rollback and temporary tablespaces, aggregates, views, and a file system staging area for flat files
Example: Estimating the Database Size Description Estimation Estimate the size 52 bytes (assumed for this example) of one row of the fact table Estimate the Channel 3 channels entries in the Customer 63 ship_to_locations lowest level Product 36 items within each History 48 months dimension Multiply t he # of (3 x 63 x 36 x 48) x 52 = 16,982,784 bytes entries for each dimension and multiply the result by the fact table row size Sparsity is low, 16,982,784 * .10 = 1,698,278 adjust by 10% 16,982,784 - 1,698,278 = 15,284,506 bytes Estimated 15.3 MB database size
Applying the Test Load Sampling • Analyze statistically significant data samples • Use test loads for different periods • Reflect day-to-day operations • Include seasonal data and worst-case scenarios: • Calculate the number of transactions • Use the average sales price approach • Consider indexes and summaries
Test Load Sampling Using the Server • Load a sample of data. • Query to determine the number of rows per block. • Estimate based on the number of rows. Query 3000+rows
ORDER_LINE Order_PK Order_Line_Num Item_FK Units PRODUCT Item_PK Package Item_Desc Item_Source Family_PK Class_PK Indexes Unique index Primarykey Nonunique index Foreignkey Unique index Primarykey Database Table data blocks Index data blocks
Indexing Types • B*tree index • Bitmap index
Indexing Types • B*tree index: small number of distinct values in a particular column • Bitmap index: used for star query transformations
Indexing Strategies ? ? ? Columns, tables, and index types size = ‘SMALL’ 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1 size = ‘MED’ 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 size = ‘LARGE’ 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 color = ‘BLUE’ 0 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 color = ‘RED’ 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 color = ‘GREEN’ 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 Result: color = blue and size = medium or large
B*tree Index KING KING KING MILLER TURNER BLAKE BLAKE JAMES TURNER WARD MILLER SCOTT SMITH KING MARGIN JAMES JONES BLAKE CLARK FORD ADAMS ALLEN BLAKE-ROWID CLARK-ROWIDFORD-ROWID
Bitmap Indexes • Store values as 1s and 0s • Are used instead of B*tree indexes when: • Tables are large • Columns have low cardinality • Multiple columns are constrained in the same query
101 single east male bracket_1 102 married central female bracket_4 103 married west female bracket_2 104 divorced west male bracket_4 105 single central female bracket_2 106 married central female bracket_3 REGION='east' REGION = 'central' REGION = 'west' 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 Bitmap Index Example CUSTOMERtable CUSTOMER_NBR MARITAL_STATUS REGION GENDER INCOME_LEVEL CREATE BITMAP INDEX REGION_IDX ON CUSTOMER(REGION); Sample bitmap index on the REGION column
0 0 0 0 1 0 1 1 col2 AND 0 OR 1 = 1 1 col3 0 1 0 1 1 0 0 1 1 0 1 1 col 5 Bitmap Index Example SELECT COUNT(*) FROM CUSTOMER WHERE MARITAL_STATUS = ‘married’ AND REGION IN (‘central’, ‘west’); REGION= 'west' Query Result REGION='central' MARITAL_STATUS ='married' 0 0 1 1 1 AND = 1 0 0 0 0 1 1
Partitioning Tables and Indexes • Large tables and indexes can be partitioned into • smaller, more manageable pieces. Tablespace Segment Extent Blocks Structure
Advantages of Partitioning • Consider using partitioning for: • Very large databases (VLDBs) • Reduction of down time for scheduled maintenance and reloading • Reduction of down time for data failure • Decision support systems (DSS) performance • I/O performance • Disk striping • Partition transparency
Partitioning • Partition table and index data by: • Time • Sales • Geography • Organization • Line of business • Partition by time