1 / 38

Physical Database Design

Physical Database Design. Barry Floyd. BUS 498 Advanced Database Management Systems. Introduction. The Physical Database Design Process Goal is to translate our conceptual designs into physical reality Draw on requirements analysis and our conceptual data model. Agenda.

gitano
Download Presentation

Physical Database Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Physical Database Design Barry Floyd BUS 498 Advanced Database Management Systems

  2. Introduction • The Physical Database Design Process • Goal is to translate our conceptual designs into physical reality • Draw on requirements analysis and our conceptual data model

  3. Agenda • Data Volume and Usage Analysis • Data Distribution Strategy • discuss this later in the quarter • Indexes • Denormalization

  4. Overview • Important step in the database design process (also the last step) • Decisions made here impact ... • data accessibility • response times • usability

  5. Vocabulary • Data volume - how many records • Data usage - how often and in what manner are the records used

  6. Data Volume Analysis • Use volume analysis to • select physical storage devices • estimate costs of storage

  7. Data Volume Analysis GIVEN LOCATION 100 TREATMENT PATIENT PHYSICIAN 50 GIVEN ITEM 500 CHARGE GIVEN

  8. Data Volume Analysis LOCATION 100 DERIVE (10) TREATMENT PATIENT 1000 PHYSICIAN 50 (20) * Keep patient record active for 30 days * Average length of stay for a patient is 3 days 100 X 30 / 3 => 1000 ITEM 500 CHARGE

  9. Data Volume Analysis LOCATION 100 (10) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (4) (20) * Each patient has 4 treatments on average. 1000 X 4 => 4000 DERIVE ITEM 500 CHARGE

  10. Data Volume Analysis * Each patient has 10 charges on average. 1000 X 10 => 10,000 LOCATION 100 TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (4) (20) (10) ITEM 500 CHARGE 10,000 (20) DERIVE

  11. Data Volume Analysis LOCATION 100 (10) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (4) (20) KNOW ... Number of records and relationships (10) ITEM 500 CHARGE 10,000 (20)

  12. Data Usage Analysis • Want to identify major transactions and processes which hit on the database • Analyze eachtransaction and process to determine access paths used and frequency of use • Create composite map from individual analyses

  13. Transaction Analysis Form TRANSACTION NUMBER MVCH-4 TRANSACTION NAME: CREATE PATIENT BILL TRANSACTION VOLUME: AVERAGE 2/HR PEAK: 10/HR (1) PATIENT 1000 (2) (3) ITEM 500 CHARGE 10,000 NO. NAME ACCESS TRAN PERIOD TYPE REF REF (1) ENTRY-PATIENT READ 1 10

  14. Transaction Analysis Form (1) PATIENT 1000 (2) (3) ITEM 500 CHARGE 10,000 NO. NAME ACCESS TRAN PERIOD TYPE REF REF (1) ENTRY-PATIENT READ 1 10 (2) PATIENT-CHARGE READ 10 100 (3) CHARGE-ITEM READ 10 100

  15. Composite Usage Map • Determine how the data structures are accessed for each transaction and process • include programs • standard queries • programmed • ad hoc

  16. Composite Usage Map (50) LOCATION 100 (25) (50) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (50) NUMBER IS PER HOUR AT PEAK VOLUME ITEM 500 CHARGE 10,000

  17. Composite Usage Map LOCATION 100 (75) (25) (50) (30) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (20) (50) (100) (200) ITEM 500 CHARGE 10,000

  18. Composite Usage Map (50) LOCATION 100 (25) (50) (75) (25) (50) (30) TREATMENT 4000 PATIENT 1000 PHYSICIAN 50 (20) (50) (50) (200) (100) ITEM 500 CHARGE 10,000

  19. Summary • Given volume and usage knowledge we can consider different physical implementation strategies, including ... • INDEXES • DENORMALIZATION • CLUSTERING

  20. Indexes • Purpose: To speed up access to a particular row or a group of rows in a table. • Also used to enforce uniqueness • Eliminates the necessity of re-sorting the table each time we need to create a sequenced list

  21. Indexes Allen 3 Brian 6 Carole 7 John 2 Karen 5 Marvin 1 Sharon 8 Sue 4 1 Marvin … 2 John ... 3 Allen ... 4 Sue ... 5 Karen ... 6 Brian ... 7 Carole ... 8 Sharon ...

  22. Example • SELECT NAME, DEPT, RATING FROM EMP WHERE RATING = 10; Indexing on RATING improves performance. Without an index, must do afull table scan.

  23. Costs of an index? • Storage space • Maintenance • Indexed must be changed for each add/delete or change in value on indexed field. • One benchmark ... insert into table w/o indexes, 0.11 seconds, w/ 8 indexes, 0.94 seconds.

  24. Access Indexes • Automatically created on primary key. • You must create other indexes as needed. • Note, creating a unique index on a foreign key turns the relationship into a 1 - 1 relationship rather than a 1 - m relationship. Let’s consider Oracle indexes and performance ...

  25. Oracle Indexes SELECT COUNT(*) FROM EMP WHERE EMP_NO>0 SELECT EMP_NAME FROM EMP WHERE EMP_NO>0 INDEX + TABLE FULL TABLE SCAN INDEX ONLY % Seconds 8.5 0.66 12.03 35.70 15.5 1.04 16.21 35.70 25.2 1.54 25.45 35.70 50.7 2.80 33.89 35.70 100 5.72 87.23 35.70 % OF FILEREAD BREAK- EVEN 26,000 Rows, 7 Rows per Block

  26. Oracle Indexes SELECT COUNT(*) FROM EMP WHERE EMP_NO>0 SELECT EMP_NAME FROM EMP WHERE EMP_NO>0 INDEX + TABLE FULL TABLE SCAN INDEX ONLY % OF FILEREAD % Seconds 8.5 0.66 2.31 4.52 15.5 1.05 4.01 4.52 25.2 1.59 6.37 4.52 50.7 2.91 12.69 4.52 100 6.01 25.37 4.52 BREAK- EVEN 26,000 Rows, 258 Rows per Block

  27. Rules of thumb • Use indexes generously for applications which are decision support/retrieval based. • Use indexes judiciously for transaction processing applications.

  28. Places to use indexes • PRIMARY KEY • FOREIGN KEYS • Non Key attributes that are referred to in qualification, sorting, and grouping (WHERE, ORDER BY, GROUP BY)

  29. Denormalization • Goal is to reduce the number of physicals reads to the storage devices by reducing the number of joins.

  30. Costs of Denormalization • Makes coding more complex • Often sacrifices flexibility • Will speed up retrieval but slow updates

  31. Including children in the parent record • Multiple addresses in the personnel record • Absolute number of children for a parent is known (e.g., 2 addresses) • The number won’t change over time • The number is not very large

  32. Clusters in Oracle • Clustering stores records from two tables into the same physical storage space • Only useful for EQUI-JOINS • Improves performance by 2-3 times

  33. Storing most recent child data in the parent record • Multiple children, but children have an ordering (e.g., date of order) • For example, perhaps storing amount of last order. • Amount of last dividend paid to a particular account

  34. Store running totals /Create extract tables • Store summary data from a child record • Year to date sales • Create a summary table which contains aggregate values over some period (say, one month)

  35. Duplicating a key beyond an immediate child record CLASS CLASS_ID PARTS PART_ID,CLASS_ID ORDERS ORDER_ID, PART_ID, CLASS_ID ADD THIS KEY

  36. Consider SQL statement for previous example SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC FROM CLASS C, PART P, ORDER O WHERE O.PART_NO = P.PART_NO AND P.CLASS = C.CLASS; SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC FROM CLASS C, ORDER O WHERE O.CLASS = C.CLASS;

  37. Record Partitioning • Breaking up a record into two parts A,B,C,D,E,F,G E,F,G A,B,C,D

  38. Summary • Logical design gives you information about the ‘how’ to build the system. • Good physical design takes into account the performance of the final design … to know how best to do this task, you must understand how the system is being used!

More Related