1 / 36

CS4432: Database Systems II

CS4432: Database Systems II. Lecture #4. Professor Elke A. Rundensteiner. Quick Logistics. Project 1 : Team formation. Teams of three Enter at my.wpi.edu until Tuesday class time, 2pm. After that, we’ll assign partners to teams. BS/MS Credit. Do project1 on your own.

huntd
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Lecture #4 Professor Elke A. Rundensteiner lecture #4

  2. Quick Logistics lecture #4

  3. Project 1 :Team formation • Teams of three • Enter at my.wpi.edu untilTuesday class time, 2pm. • After that, we’ll assign partners to teams. lecture #4

  4. BS/MS Credit • Do project1 on your own. • Go to my.wpi.edu and enter yourself as one-person team (put a note re BS/MS) • Use BS/MS Form (even if not yet enrolled). lecture #4

  5. Lecture #4 – part 1 Review Chapter 2.3: Merge-Sort Algorithm lecture #4

  6. Using secondary storage effectively • General Wisdom : • I/O costs dominate • Design algorithms to reduce I/O lecture #4

  7. Two-Way External Merge Sort 6,2 2 Input file 3,4 9,4 8,7 5,6 3,1 PASS 0 1,3 2 1-page runs 3,4 2,6 4,9 7,8 5,6 PASS 1 • Costs for pass : 2 * N • # of passes : height of tree • total cost : product of above 4,7 1,3 2,3 2-page runs 8,9 5,6 2 4,6 PASS 2 2,3 4,4 1,2 4-page runs 6,7 3,5 6 8,9 PASS 3 1,2 2,3 3,4 8-page runs 4,5 6,6 7,8 9 lecture #4

  8. External Merge Sort • What if we had more buffer pages? • How do we utilize them wisely ? The two main ideas are? lecture #4

  9. Phase 1 INPUT 1 . . . INPUT 2 . . . INPUT B Disk Disk B Main memory buffers • Construct as large as possible starter lists. lecture #4

  10. Phase 2 INPUT 1 • Compose as many sorted sublists into one long sorted list. . . . INPUT 2 . . . OUTPUT INPUT B-1 Disk Disk B Main memory buffers lecture #4

  11. Example • Buffer : with 5 buffer pages • File to sort : 108 pages • Pass 0: • Size of each run? • Number of runs? • Pass 1: • Size of each run? • Number of runs? • Pass 2: ??? lecture #4

  12. Example • Buffer : with 5 buffer pages • File to sort : 108 pages • Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) • Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) • Pass 2: 2 sorted runs, 80 pages and 28 pages • Pass 3: Sorted file of 108 pages • Total I/O costs: ? lecture #4

  13. Example • Buffer : with 5 buffer pages • File to sort : 108 pages • Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) • Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) • Pass 2: 2 sorted runs, 80 pages and 28 pages • Pass 3: Sorted file of 108 pages • Total I/O costs: 2*N ( 4 ) lecture #4

  14. Double Buffering (Useful here) • To reduce wait time for I/O request to complete, can prefetch into `shadow block’. • Potentially, more passes; in practice, most files still sorted in 2 or at most 3 passes. INPUT 1 INPUT 1' INPUT 2 OUTPUT INPUT 2' OUTPUT' b block size Disk INPUT k Disk INPUT k' B main memory buffers, k-way merge lecture #4

  15. Lecture #4 – part 2 Pointer to Chapter 2.4: Improving Access Times of Secondary Storage :Five Disk Optimizations lecture #4

  16. Up to now we assume: • One single disk • Blocks are chosen randomly from disk to be read in lecture #4

  17. Five Optimizations(in controller or O.S.) • Group blocks accessed together on same Cylinder • One big disk  several smaller disks • Mirror disks -> multiple copies of same data • Prefetch blocks into main memory -> ( e.g., double-buffering. ) • Disk Scheduling Algorithms -> to select order in which several blocks will be read or written (e.g., elevator algorithm) lecture #4

  18. Assessment of Five Optimizations • Effect for “regular predictable tasks”, • like one long dedicated process with sequential read • e.g., a database SORT (2-phase-multi-way-sort) • Effect for many “unpredictable irregular tasks” • like many short processes in parallel • e.g., airline reservations • Or, some mixture in workload … lecture #4

  19. Five Optimizations : Useful or Not ? • Group blocks together on same Cylinder • One big disk  several smaller disks • Mirror disks -> multiple copies of same data • Prefetch blocks -> e.g., double-buffering. • Disk Scheduling -> e.g., elevator algo lecture #4

  20. Assessment of Five Optimizations • Book has in-depth answer to this assessment. • So read the book (chapter 2.4). • Also, question 1 in homework 1 has you study this question. lecture #4

  21. Lecture #4 – part 3 Storage Layout : How to lay out data on disk. ( chapter 3) lecture #4

  22. Overview Data Items Records Blocks Files Memory lecture #4

  23. What we have available: Bytes 8 bits What are the data items we want to store? • a salary • a name • a picture lecture #4

  24. To represent: • Integer (short): 2 bytes e.g., 35 is 00000000 00100011 lecture #4

  25. Can we use less than 1 byte/code? Yes, but only if desperate... To represent: 1111 1111 • Boolean e.g., TRUE FALSE 00000000 • Application specific • e.g., RED  1 GREEN  3 • BLUE  2 YELLOW  4 … lecture #4

  26. To represent: • Characters  various coding schemes suggested (ascii) Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010 lecture #4

  27. c c a a t t To represent: String of characters • Null terminated e.g., • Length given e.g., • Fixed length e.g., in Oracle define the string length. 3 lecture #4

  28. Key Points • Fixed length items • Variable length items • - usually length given at beginning • Type of an item: Tells us how to • interpret (plus size if fixed) lecture #4

  29. Overview Data Items Records Blocks Files Memory lecture #4

  30. Record - Collection of related data items (called FIELDS) E.g.: Employee record: name field, salary field, date-of-hire field, ... lecture #4

  31. Types of records: • Main choices: • FIXED vs VARIABLE FORMAT • FIXED vs VARIABLE LENGTH lecture #4

  32. Fixed format A SCHEMA contains information such as: - # fields (attributes) - type of each field (length) - order of attributes in record - meaning of each field (domain) - constraints (primary key, etc). Not associated with each record. lecture #4

  33. Example: fixed format and length Employee record (1) E#, 2 byte integer (2) E.name, 10 char. Schema (3) Dept, 2 byte code We can simply concatenate fields. 55 s m i t h 02 Records 83 j o n e s 01 lecture #4

  34. More on Record Layout next time lecture #4

  35. Next: Data Items Records Blocks Files Memory lecture #4

  36. Reminder : Homework 1 out Topic: all about chapter 2. Due: Next Friday (in class) Reminder : Read chapter 3 in textbook lecture #4

More Related