370 likes | 394 Views
CS4432: Database Systems II. Lecture #4. Professor Elke A. Rundensteiner. Quick Logistics. Project 1 : Team formation. Teams of three Enter at my.wpi.edu until Tuesday class time, 2pm. After that, we’ll assign partners to teams. BS/MS Credit. Do project1 on your own.
E N D
CS4432: Database Systems II Lecture #4 Professor Elke A. Rundensteiner lecture #4
Quick Logistics lecture #4
Project 1 :Team formation • Teams of three • Enter at my.wpi.edu untilTuesday class time, 2pm. • After that, we’ll assign partners to teams. lecture #4
BS/MS Credit • Do project1 on your own. • Go to my.wpi.edu and enter yourself as one-person team (put a note re BS/MS) • Use BS/MS Form (even if not yet enrolled). lecture #4
Lecture #4 – part 1 Review Chapter 2.3: Merge-Sort Algorithm lecture #4
Using secondary storage effectively • General Wisdom : • I/O costs dominate • Design algorithms to reduce I/O lecture #4
Two-Way External Merge Sort 6,2 2 Input file 3,4 9,4 8,7 5,6 3,1 PASS 0 1,3 2 1-page runs 3,4 2,6 4,9 7,8 5,6 PASS 1 • Costs for pass : 2 * N • # of passes : height of tree • total cost : product of above 4,7 1,3 2,3 2-page runs 8,9 5,6 2 4,6 PASS 2 2,3 4,4 1,2 4-page runs 6,7 3,5 6 8,9 PASS 3 1,2 2,3 3,4 8-page runs 4,5 6,6 7,8 9 lecture #4
External Merge Sort • What if we had more buffer pages? • How do we utilize them wisely ? The two main ideas are? lecture #4
Phase 1 INPUT 1 . . . INPUT 2 . . . INPUT B Disk Disk B Main memory buffers • Construct as large as possible starter lists. lecture #4
Phase 2 INPUT 1 • Compose as many sorted sublists into one long sorted list. . . . INPUT 2 . . . OUTPUT INPUT B-1 Disk Disk B Main memory buffers lecture #4
Example • Buffer : with 5 buffer pages • File to sort : 108 pages • Pass 0: • Size of each run? • Number of runs? • Pass 1: • Size of each run? • Number of runs? • Pass 2: ??? lecture #4
Example • Buffer : with 5 buffer pages • File to sort : 108 pages • Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) • Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) • Pass 2: 2 sorted runs, 80 pages and 28 pages • Pass 3: Sorted file of 108 pages • Total I/O costs: ? lecture #4
Example • Buffer : with 5 buffer pages • File to sort : 108 pages • Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) • Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) • Pass 2: 2 sorted runs, 80 pages and 28 pages • Pass 3: Sorted file of 108 pages • Total I/O costs: 2*N ( 4 ) lecture #4
Double Buffering (Useful here) • To reduce wait time for I/O request to complete, can prefetch into `shadow block’. • Potentially, more passes; in practice, most files still sorted in 2 or at most 3 passes. INPUT 1 INPUT 1' INPUT 2 OUTPUT INPUT 2' OUTPUT' b block size Disk INPUT k Disk INPUT k' B main memory buffers, k-way merge lecture #4
Lecture #4 – part 2 Pointer to Chapter 2.4: Improving Access Times of Secondary Storage :Five Disk Optimizations lecture #4
Up to now we assume: • One single disk • Blocks are chosen randomly from disk to be read in lecture #4
Five Optimizations(in controller or O.S.) • Group blocks accessed together on same Cylinder • One big disk several smaller disks • Mirror disks -> multiple copies of same data • Prefetch blocks into main memory -> ( e.g., double-buffering. ) • Disk Scheduling Algorithms -> to select order in which several blocks will be read or written (e.g., elevator algorithm) lecture #4
Assessment of Five Optimizations • Effect for “regular predictable tasks”, • like one long dedicated process with sequential read • e.g., a database SORT (2-phase-multi-way-sort) • Effect for many “unpredictable irregular tasks” • like many short processes in parallel • e.g., airline reservations • Or, some mixture in workload … lecture #4
Five Optimizations : Useful or Not ? • Group blocks together on same Cylinder • One big disk several smaller disks • Mirror disks -> multiple copies of same data • Prefetch blocks -> e.g., double-buffering. • Disk Scheduling -> e.g., elevator algo lecture #4
Assessment of Five Optimizations • Book has in-depth answer to this assessment. • So read the book (chapter 2.4). • Also, question 1 in homework 1 has you study this question. lecture #4
Lecture #4 – part 3 Storage Layout : How to lay out data on disk. ( chapter 3) lecture #4
Overview Data Items Records Blocks Files Memory lecture #4
What we have available: Bytes 8 bits What are the data items we want to store? • a salary • a name • a picture lecture #4
To represent: • Integer (short): 2 bytes e.g., 35 is 00000000 00100011 lecture #4
Can we use less than 1 byte/code? Yes, but only if desperate... To represent: 1111 1111 • Boolean e.g., TRUE FALSE 00000000 • Application specific • e.g., RED 1 GREEN 3 • BLUE 2 YELLOW 4 … lecture #4
To represent: • Characters various coding schemes suggested (ascii) Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010 lecture #4
c c a a t t To represent: String of characters • Null terminated e.g., • Length given e.g., • Fixed length e.g., in Oracle define the string length. 3 lecture #4
Key Points • Fixed length items • Variable length items • - usually length given at beginning • Type of an item: Tells us how to • interpret (plus size if fixed) lecture #4
Overview Data Items Records Blocks Files Memory lecture #4
Record - Collection of related data items (called FIELDS) E.g.: Employee record: name field, salary field, date-of-hire field, ... lecture #4
Types of records: • Main choices: • FIXED vs VARIABLE FORMAT • FIXED vs VARIABLE LENGTH lecture #4
Fixed format A SCHEMA contains information such as: - # fields (attributes) - type of each field (length) - order of attributes in record - meaning of each field (domain) - constraints (primary key, etc). Not associated with each record. lecture #4
Example: fixed format and length Employee record (1) E#, 2 byte integer (2) E.name, 10 char. Schema (3) Dept, 2 byte code We can simply concatenate fields. 55 s m i t h 02 Records 83 j o n e s 01 lecture #4
More on Record Layout next time lecture #4
Next: Data Items Records Blocks Files Memory lecture #4
Reminder : Homework 1 out Topic: all about chapter 2. Due: Next Friday (in class) Reminder : Read chapter 3 in textbook lecture #4