400 likes | 731 Views
Physical Database Design. Barry Floyd. BUS 498 Advanced Database Management Systems. Introduction. The Physical Database Design Process Goal is to translate our conceptual designs into physical reality Draw on requirements analysis and our conceptual data model. Agenda.
E N D
Physical Database Design Barry Floyd BUS 498 Advanced Database Management Systems
Introduction • The Physical Database Design Process • Goal is to translate our conceptual designs into physical reality • Draw on requirements analysis and our conceptual data model
Agenda • Data Volume and Usage Analysis • Data Distribution Strategy • discuss this later in the quarter • Indexes • Denormalization
Overview • Important step in the database design process (also the last step) • Decisions made here impact ... • data accessibility • response times • usability
Vocabulary • Data volume - how many records • Data usage - how often and in what manner are the records used
Data Volume Analysis • Use volume analysis to • select physical storage devices • estimate costs of storage
Data Volume Analysis GIVEN LOCATION 100 TREATMENT PATIENT PHYSICIAN 50 GIVEN ITEM 500 CHARGE GIVEN
Data Volume Analysis LOCATION 100 DERIVE (10) TREATMENT PATIENT 1000 PHYSICIAN 50 (20) * Keep patient record active for 30 days * Average length of stay for a patient is 3 days 100 X 30 / 3 => 1000 ITEM 500 CHARGE
Data Volume Analysis LOCATION 100 (10) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (4) (20) * Each patient has 4 treatments on average. 1000 X 4 => 4000 DERIVE ITEM 500 CHARGE
Data Volume Analysis * Each patient has 10 charges on average. 1000 X 10 => 10,000 LOCATION 100 TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (4) (20) (10) ITEM 500 CHARGE 10,000 (20) DERIVE
Data Volume Analysis LOCATION 100 (10) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (4) (20) KNOW ... Number of records and relationships (10) ITEM 500 CHARGE 10,000 (20)
Data Usage Analysis • Want to identify major transactions and processes which hit on the database • Analyze eachtransaction and process to determine access paths used and frequency of use • Create composite map from individual analyses
Transaction Analysis Form TRANSACTION NUMBER MVCH-4 TRANSACTION NAME: CREATE PATIENT BILL TRANSACTION VOLUME: AVERAGE 2/HR PEAK: 10/HR (1) PATIENT 1000 (2) (3) ITEM 500 CHARGE 10,000 NO. NAME ACCESS TRAN PERIOD TYPE REF REF (1) ENTRY-PATIENT READ 1 10
Transaction Analysis Form (1) PATIENT 1000 (2) (3) ITEM 500 CHARGE 10,000 NO. NAME ACCESS TRAN PERIOD TYPE REF REF (1) ENTRY-PATIENT READ 1 10 (2) PATIENT-CHARGE READ 10 100 (3) CHARGE-ITEM READ 10 100
Composite Usage Map • Determine how the data structures are accessed for each transaction and process • include programs • standard queries • programmed • ad hoc
Composite Usage Map (50) LOCATION 100 (25) (50) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (50) NUMBER IS PER HOUR AT PEAK VOLUME ITEM 500 CHARGE 10,000
Composite Usage Map LOCATION 100 (75) (25) (50) (30) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (20) (50) (100) (200) ITEM 500 CHARGE 10,000
Composite Usage Map (50) LOCATION 100 (25) (50) (75) (25) (50) (30) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (20) (50) (50) (200) (100) ITEM 500 CHARGE 10,000
Summary • Given volume and usage knowledge we can consider different physical implementation strategies, including ... • INDEXES • DENORMALIZATION • CLUSTERING
Indexes • Purpose: To speed up access to a particular row or a group of rows in a table. • Also used to enforce uniqueness • Eliminates the necessity of re-sorting the table each time we need to create a sequenced list
Indexes Allen 3 Brian 6 Carole 7 John 2 Karen 5 Marvin 1 Sharon 8 Sue 4 1 Marvin … 2 John ... 3 Allen ... 4 Sue ... 5 Karen ... 6 Brian ... 7 Carole ... 8 Sharon ...
Example • SELECT NAME, DEPT, RATING FROM EMP WHERE RATING = 10; Indexing on RATING improves performance. Without an index, must do afull table scan.
Costs of an index? • Storage space • Maintenance • Indexed must be changed for each add/delete or change in value on indexed field. • One benchmark ... insert into table w/o indexes, 0.11 seconds, w/ 8 indexes, 0.94 seconds.
Access Indexes • Automatically created on primary key. • You must create other indexes as needed. • Note, creating a unique index on a foreign key turns the relationship into a 1 - 1 relationship rather than a 1 - m relationship. Let’s consider Oracle indexes and performance ...
Oracle Indexes SELECT COUNT(*) FROM EMP WHERE EMP_NO>0 SELECT EMP_NAME FROM EMP WHERE EMP_NO>0 INDEX + TABLE FULL TABLE SCAN INDEX ONLY % Seconds 8.5 0.66 12.03 35.70 15.5 1.04 16.21 35.70 25.2 1.54 25.45 35.70 50.7 2.80 33.89 35.70 100 5.72 87.23 35.70 % OF FILEREAD BREAK- EVEN 26,000 Rows, 7 Rows per Block
Oracle Indexes SELECT COUNT(*) FROM EMP WHERE EMP_NO>0 SELECT EMP_NAME FROM EMP WHERE EMP_NO>0 INDEX + TABLE FULL TABLE SCAN INDEX ONLY % OF FILEREAD % Seconds 8.5 0.66 2.31 4.52 15.5 1.05 4.01 4.52 25.2 1.59 6.37 4.52 50.7 2.91 12.69 4.52 100 6.01 25.37 4.52 BREAK- EVEN 26,000 Rows, 258 Rows per Block
Rules of thumb • Use indexes generously for applications which are decision support/retrieval based. • Use indexes judiciously for transaction processing applications.
Places to use indexes • PRIMARY KEY • FOREIGN KEYS • Non Key attributes that are referred to in qualification, sorting, and grouping (WHERE, ORDER BY, GROUP BY)
Denormalization • Goal is to reduce the number of physicals reads to the storage devices by reducing the number of joins.
Costs of Denormalization • Makes coding more complex • Often sacrifices flexibility • Will speed up retrieval but slow updates
Including children in the parent record • Multiple addresses in the personnel record • Absolute number of children for a parent is known (e.g., 2 addresses) • The number won’t change over time • The number is not very large
Clusters in Oracle • Clustering stores records from two tables into the same physical storage space • Only useful for EQUI-JOINS • Improves performance by 2-3 times
Storing most recent child data in the parent record • Multiple children, but children have an ordering (e.g., date of order) • For example, perhaps storing amount of last order. • Amount of last dividend paid to a particular account
Store running totals /Create extract tables • Store summary data from a child record • Year to date sales • Create a summary table which contains aggregate values over some period (say, one month)
Duplicating a key beyond an immediate child record CLASS CLASS_ID PARTS PART_ID,CLASS_ID ORDERS ORDER_ID, PART_ID, CLASS_ID ADD THIS KEY
Consider SQL statement for previous example SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC FROM CLASS C, PART P, ORDER O WHERE O.PART_NO = P.PART_NO AND P.CLASS = C.CLASS; SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC FROM CLASS C, ORDER O WHERE O.CLASS = C.CLASS;
Record Partitioning • Breaking up a record into two parts A,B,C,D,E,F,G E,F,G A,B,C,D
Summary • Logical design gives you information about the ‘how’ to build the system. • Good physical design takes into account the performance of the final design … to know how best to do this task, you must understand how the system is being used!