DATABASE PHYSICAL DESIGN

DATABASE PHYSICAL DESIGN Chandra S. Amaravadi

INTRODUCTION

PHYSICAL DATABASE DESIGN Physical database design is concerned with issues revolving around data base implementation: • Implementation design • Database storage, access & location • File organization & constraints

THE THREE FORMS OF DATA External Conceptual/ Base table Internal/ Hardware level 100 ... 200 ... 300 ... These three levels provide logical and physical data independence

THE THREE TYPES OF MODELS External Create view Drop view Views Conceptual Create table Alter table Schemas File Organizations Create index drop index Internal Models Facilities

DATABASE PHYSICAL DESIGN Inputs?

COMPONENTS OF PHYSICAL DESIGN 1. Implementation design 2. Storage, access & distribution strategies 3. File organizations 4. Specifications for integrity constraints (later)

Implementation design Concerned with taking the results of normalization and designing tables, attributes, data types for implementation. • Decide on tables (de-normalization) • Decide on primary and cross reference keys (not discussed further) • Decide on attribute data types (not discussed further) • E.g. fixed vs variable length fields • integer vs double integer • Design reports and forms (not discussed further)

DECIDING ON TABLES Denormalization is going back in the normal forms to reduce schema overhead Denormalization Example (for 1:1) Parts(Part#, PartName, ) Container (ContainerID, #fin, #needed, Part#) Parts(Part#, PartName, ContainerID, #fin, #needed)

DECIDING ON TABLES.. Denormalization Example (for M:N) Ord_dt Ord# Descr. Prod# ORDERS Are for PRODUCTS Qty What tables does normalization result in?

DENORMALIZATION Orders(ord#, ord_dt, ..) Product(prod.#, descr, ..) Orders for prod (prod.#, ord#, qty) Orders(ord#, ord_dt, ..) Product(prod.#, ord#, descr., qty..)

COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage and access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later)

STORAGE & ACCESS STRATEGIES ALSO CALLED VOLUME & USAGE ANALYSIS OBJECTIVES • Estimate storage requirements (Volume analysis) • Determine media to be used (not discussed) • Study how data is being acccessed (Usage analysis) • Use these to develop file organization (later) Volume and Usage analysis is carried out with a composite usage map.

COMPOSITE USAGE MAP A composite usage map is simply an ER chart (without attr), that shows the number of records, and the frequency/pattern with which they are accessed. • Used for volume & usage analysis  file org. • Superimposed on ER Chart • Attributes are not shown • Shows estimated number of records (volume) • Shows type of access (dotted lines )

VOLUME & USAGE ANALYSIS • Equipment, Parts and PE tables • Equipment: 100; • Parts:12,000; • PE: 10,00 • 20 inquiries per hour to Equipment • 300 inquiries per hour on Parts table • 70% of these inquiries also need to know Equipment info. Draw a composite usage map, estimate storage requirements and develop a suitable file organization

COMPOSITE USAGE MAP 20 EQUIPMENT (100) ARE FOR PE ??? (10,000) PARTS ???? (12,000)

FOR DISCUSSION How can one estimate the size of a database?

ESTIMATING STORAGE REQMTS. FOR PARTS AND EQUIPMENT 7 10 12 2 1 1 EQUIPMENT (Model#, Descr, Mfr., Price, HP, WT) 1 10 12 2 PARTS(Part#, Descr, Mfr, Price) 7 1 1 PE (Model#, Part#, Qty) Equipment table: 7+10+12+2+1+1 = 33 bytes/record Parts table: ?? PE table: ?? Total storage requirements = ??

A MORE ELABORATE EXAMPLE 70% 40% • Parts are manufactured parts and purchased parts • Parts: 1,000; Suppliers:50; Quotations: 2,500 • Total of 200 parts inquiries • 60 direct inquiries to purchased parts • Of the purchased parts inquiries, 80 are also to • quotation • Of these 80, 70 are to supplier as well. • 75 direct queries to supplier • Of these 40 are for quotation • All of these are also for parts

ANOTHER EXAMPLE.. 200 75 SUPPLIER PART 70 60 (50) (1000) 80 Is-a 140 40% 70% 40 MANU- FACTURED PURCH- ASED QUOTA- TION (700) (400) (2500) 40 80 A COMPOSITE USAGE MAP Note: # of records are in red; the # of accesses are in blue

STORAGE REQUIREMENTS PART TABLE: PART_NO (5) DESCRIPTION (15) LOCATION (10) QUANTITY (1) RECORD SIZE: 31 FILE SIZE: 31 * 1100 = 34,300 Bytes QUOTATION TABLE: Estimated record size 150 Estimated file size 150*2500 = 375,000 Bytes Note: This is done similarly for other tables.

COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage & access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later)

DISTRIBUTION STRATEGIES Distribution strategies are concerned with where the files are physically located. 1. Centralized Replicated (not discussed) 2. Distributed Partitioned

DISTRIBUTION STRATEGIES Centralized -- All the data is stored in one physical location. Distributed -- The data is stored in multiple physical locations. Replicated -- The database is duplicated in multiple locations. Partitioned -- The database is divided into “fragments” and each fragment is stored in a different location.

CENTRALIZED VS DISTRIBUTED • Which is bottleneck? • Which causes security problems? • Which method may be required for business reasons? • In which setup is data more accessible? • Which provides better performance?

CENTRALIZED STRATEGY General Principle: Maximize local access, minimize remote access S1 S2 100 600 S3 WHERE SHOULD WE LOCATE THE DATABASE? S1, S2 or S3 500

This slide is blank

DISTRIBUTED DATABASE partitioning MPLS LA SF

COMPONENTS OF PHYSICAL DESIGN.. 1. Implementation design 2. Storage & access strategies 3. Distribution strategies 4. File organizations 5. Specifications for integrity constraints (later)

FILE ORGANIZATION File 1 Rec. 1,2.. Tracks Sectors How records are arranged on secondary storage or mapping between ____ and ______?

DATA ACCESS (FYI) DBMS O/S Requests Consults FAT/NTFS Directory tables Generates instructions to IOP Hard drive Partition IOP RAM Database storage

FILE ORGANIZATION Selection Criteria • Retrieval time (disk access) • Access type (direct, sequential) • Storage space • Maintenance effort

OVERVIEW OF FILE ORGANIZATIONS • Sequential • Hashed • Indexed ISAM VSAM

OVERVIEW OF FILE ORGANIZATIONS.. Sequential -- Records are stored one after another in pkey sequence. Hashed -- Record address is determined by subjecting pkey to hashing algorithm. Indexed -- Same as sequential except that there is an index file which places keys into a separate file for ease of searching.

THE SEQUENTIAL ORGANIZATION • Records in Pkey sequence • Access only sequential • Insertions/Deletions in sequential order • Simple organization • good for batch updates

THE HASHING ORGANIZATION A type of file organization where record addresses are generated by subjecting primary keys to a hashing routine, usually by dividing by a prime# Hashing Algorithm Pkey Hash Address + Address of Starting Block = REM [(Pkey)/(Prime#)] 3432

HASHING CONCEPTS 3432 Following are important concepts in hashing: Record address = hash address + physical addr • Hashing algorithm • Hash address • Buckets & Bucket size • Slots • Collisions/overflows • Load factor • Search length 43 1 2 3 4 5 6 7 .. n File space Pkey = 43 Hash address = (43 remainder 7) = 1 Record address = 3432 + 1 = 3433

HASHING CONCEPTS.. Hashing algorithm – the formula used to calculate a record address Hash address – an address (within block) where a hashed record is stored Buckets – storage area for a group of records; bucket size refers to # of slots. Slots – storage area for an individual record Collision – when two records hash to the same address Load factor – is the ratio of # of records to the total space allocated Average search length – is the time it takes to retrieve a record on the avg. (usually expressed in terms of disk accesses) Disk access – every time a disk is accessed for getting a record (if the record is stored in its hardware address, one access otherwise it depends on record location)

HASHING ALGORITHM • Choose load factor • Identify # of buckets to be allocated • Select a prime# close to this number • Divide each pkey by prime# • Remainder = record address • Sequentially number the buckets • Place each record to its address • If there are overflows, use Open

HASHING CONCEPTS.. Collision: When two keys hash to the same address 1 2 3 4 5 6 7 .. n • Open overflow(store in unallocated slots) • Chained overflow(a separate area) OVERFLOWS

HASHING EXAMPLE Given Part#s: 100 Gears 120 Scrapers 130 Aux motors 140 Crankshafts 145 Cylinder heads 150 Pistons 100 Mod 7 = 2 120 Mod 7 = 1 130 Mod 7 = 4 140 Mod 7 = 0 145 Mod 7 = 5 150 Mod 7 = 3 • assume 8 buckets (0..7) • assume 1 slot per bucket • assume disk access time of 20 ms

HASHING EXAMPLE.. Bucket size = ? 0 140 Crankshaft 120 Scrapers 1 Insert: 135 Shovel? 135 Mod 7 = 2 2 100 Gears 150 Pistons 3 130 Aux. motor 4 Average search length? 6 records -> 1 access 1 record -> 2 accesses 145 Cylinders 5 6 7 Load factor: ? FILE LOADINGS

THE HASHING ORGANIZATION EVALUATION • H(pkey) --> record address • Records in hash sequence • Need to allocate extra space • Load factor between 60-80% • Good for low activity (FAR) files • Real-time and OO applns.

DISCUSSION A parts file with Part# as the pkey includes records with the following part# values: 23,37,46,48, 56,18, 10, 71, 16, 24, 39, 47 and 69. The file uses 8 buckets numbered 0 to 7. Each bucket holds two records. Load these records into the file in the given order using the hash function h(K) = K mod 8. Calculate the average search length in terms of # of disk accesses.

INDEXED ORGANIZATION A method of file organization where a subset of key values are stored in an index. Types are: • Primary key • Secondary key • Clustered

THE INDEXED ORGANIZATION (ISAM) • Records are in pkey sequence (master file) • But are organized into groups • Grouping information is stored in • index file • Records can be inserted at random • Records can be accessed in sequence or at random

Name 100-103 101 Jacob 104-108 103 Becky ………. 104 Scott 108 Angela THE INDEXED ORGANIZATION Master file (sequence set) Index file (index set) Emp ID

THE INDEXED ORGANIZATION CYLINDER2 CYLINDER1 TRACKS CYLINDER2 CYLINDER1

THE ISAM ORGANIZATION Track index Cylinder index 87 189 300 Index Set 43 69 87 136 150 … …. … 122 136 … …. … 24 32 43 141 150 172 Sequence Set 45 62 69 … …. … 74 77 87 175 181 189 278 281 300 250 300 CYLINDER1 CYLINDER N.. Overflow tracks Note: Assume that the corresponding HW addresses are stored along with the pkeys

INSERTIONS IN ISAM • Identify track where record needs • to be inserted • If the track is full, insert in overflow area • If the track has room insert pkey in sequence • Update track index and cylinder index if necessary

DATABASE PHYSICAL DESIGN

DATABASE PHYSICAL DESIGN

Presentation Transcript

Chapter 8 Physical Database Design

Physical Database Design and Tuning

Physical Database Design I

Physical Database Design

Physical Database Design

Physical Database Design

Physical Database Design Chapter 5

Physical Database Design

Compression Aware Physical Database Design

Physical Database Design

L04: Physical Database Design (2)

Physical Database Design

Physical Database Design

Physical Database Design

Physical Database Design and Tuning

Physical Database Design

Physical Database Design

Physical Database Design I

PHYSICAL DATABASE DESIGN

Physical Database Design

Physical Database Design

Physical Database Design