1 / 42

CSE232A: Database System Principles Hardware

Learn about hardware aspects of storing, retrieving data in DB systems. Explore memory hierarchy, disk access, cost; focus on volatile vs. non-volatile storage, Moore's Law impact, and disk transfer rates. Understand terms like platter, sector, track, and calculate time to fetch blocks.

lvernon
Download Presentation

CSE232A: Database System Principles Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE232A: Database System PrinciplesHardware

  2. Database System Architecture Query Processing Transaction Management SQL query Calls from Transactions (read,write) Parser Transaction Manager relational algebra Hardware aspects of storing and retrieving data View definitions Query Rewriter and Optimizer Lock Table Concurrency Controller Statistics & Catalogs & System Data query execution plan Recovery Manager Execution Engine Buffer Manager Log Data + Indexes

  3. Memory Hierarchy • Cache memory • On-chip and L2 • Caching outside control of DB system • RAM • Addressable space includes virtual memory but DB systems avoid it • Disk • Access speed & Transfer rate • Winchester, arrays,… • Tertiary storage • Tapes, jukeboxes, DVDs Access Speed Cost per byte Capacity

  4. Storage Cost offline tape nearline tape & optical disks 1015 1013 magnetic optical disks 1011 electronic secondary online tape 109 typical capacity (bytes) electronic main 107 105 cache 103 103 10-9 10-6 10-3 10-0 access time (sec)

  5. Storage Cost from Gray & Reuter 104 cache electronic main online tape 102 electronic secondary magnetic optical disks nearline tape & optical disks dollars/MB 100 10-2 offline tape 10-4 103 10-9 10-6 10-3 10-0 access time (sec)

  6. Volatile Vs Non-Volatile Storage • Persistence important for transaction atomicity and durability • Even if database fits in main memory changes have to be written in non-volatile storage • Hard disk • RAM disks w/ battery • Flash memory

  7. Cost of Disk Access: Non-trivial part of estimating performance on secondary storage • How many blocks were accessed ? • Clustered/consecutive ? • Such complexities also apply to flash, even main memory • Learn to analyze them when you make the next generation of secondary storage data structures

  8. Moore’s Law: Different Rates of Improvement Lead to Reconsiderations Clustered/sequential access-based algorithms become relatively better • Processor speed • Main memory bit/$ • Disk bit/$ • RAM access speed • Disk access speed • Disk transfer rate Disk Transfer Rate Disk Access Time

  9. Moore’s Law: Same Phenomenon Applies to RAM Algorithms that access memory sequentially have better constant factors than algorithms that access randomly RAM Transfer Rate RAM Access Time

  10. Moore’s Law: Different Rates of Improvement Cost of “miss” increases Cache Capacity RAM Capacity Disk Access Time

  11. Focus on: “Typical Disk” BUS Disk Controller … Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap

  12. Often different numbers of sectors per track Top View Sector Track Block (typically multiple sectors) Gap

  13. “Typical” Numbers Diameter: 1 inch  15 inches Cylinders: 100  20000 Surfaces: 1 (CDs)  (Tracks/cyl) 2 (floppies)   5 (typical hd)  30 Sector Size: 512B  50K Capacity: 360 KB (old floppy)  200 GB

  14. Key performance metric: Time to fetch block block x in memory I want block X ? Time = Seek Time (locate track) + Rotational Delay (locate sector)+ Transfer Time (fetch block) + Other (disk controller, …)

  15. Track Where Head must go Seek Delay Track Where Head is

  16. Rotational Delay Head Here Block I Want

  17. Seek Time 3 or 5x Time x Few ms 1 N Cylinders Traveled

  18. Average Random Seek Time N N  SEEKTIME (i  j) S = N(N-1) j=1 ji i=1 “Typical” S: 10 ms  40 ms

  19. Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (7200 RPM) Assume we have to start reading from start of first sector

  20. Transfer Rate: t • “typical” t: 1  3 MB/second • transfer time: block size t

  21. Other Delays • CPU time to issue I/O • Contention for controller • Contention for bus, memory “Typical” Value: 0

  22. Homework Practice Problem • Single surface • Rotation speed 7200rpm • 16,384 tracks • 128 sectors/track • 4096 bytes/sector • 4 sectors/block (16,384 bytes/block) • SEEKTIME (i  j) = [1000 + (j-i)] μs • Neglect gaps • Calculate minimum, maximum, average time to fetch one block

  23. Practice Problem: Minimum Time • Head is at the start of the first sector of the block • Just compute transfer time • 4 sectors cover 4/128 of a track • 1 full rotation takes 60/7200=8.33ms • Transfer time is 8.33 * 4 /128 = 0.26ms

  24. Practice Problem: Maximum Time • Assume read must start at the first sector • Head is at innermost, required track is the outermost • Seek time = … • Head just missed the beginning • Rotational delay = … • Transfer time = …

  25. Practice problem: Average time • Solve…

  26. So far: Random Block Access • What about: Reading “Next” block? Time to get = Block Size + Negligible block t - skip gap - switch track - once in a while, next cylinder

  27. Rule of Random I/O: ExpensiveThumb Sequential I/O: Much less • Ex: 1 KB Block • Random I/O:  20 ms. • Sequential I/O:  1 ms.

  28. Practice Problem cont’d: Sustained Bandwidth over Track • Assume required blocks are consecutive on single track • What is the approximate sustained bandwidth of fetching consecutive blocks? • 128 sectors/track * 4KB/sector in 8.33ms/track full rotation = 512KB/8.33ms = 61.46KB/ms

  29. Suggested optimization • Cluster data in consecutive blocks • Give an extra point to algorithms that • exploit data clustering by avoiding “random” accesses • Read/write consecutive blocks

  30. An Algorithm with Little Random Access: 2-Phase Merge Sort P X Z K A R Y L J I W D H F E C Main Memory: 4 blocks READ P Z K A L W D E SORT SORT WRITE A A D E D K K P L … P W D K A A Z P D E D K K P L P W D K Z P A A C D D K F P MERGE WRITE X H J C Y I R F Improve by bringing max number of blocks in memory in Phase 2

  31. Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t

  32. To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?] To Modify a Block?

  33. Block Address: • Physical Device • Cylinder # • Surface # • Sector Once upon a time DBs had access to such – now it is the OS’s domain

  34. Optimizations(in controller or O.S.) • Disk Scheduling Algorithms • e.g., elevator algorithm • Pre-fetch • Arrays

  35. Double Buffering Problem: Have a File • Sequence of Blocks B1, B2 Have a Program • Process B1 • Process B2 • Process B3 ...

  36. Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer ...

  37. Say P = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

  38. process process B C B A A A B C D E F G done done Double Buffering Memory: Disk:

  39. Say P  R P = Processing time/block R = IO time/block n = # blocks What is processing time? • Double buffering time = R + nP • Single buffering time = n(R+P) Improvement much more dramatic if consequtive blocks: …

  40. Unfortunately... • Big Block  Read in more useless stuff! • and takes longer to read Block Size Selection? • Big Block  Amortize I/O Cost

  41. Trend • memory prices drop and memory capacities increase, • transfer rates increase • Disk access times do not increase that much  blocks get bigger ...

  42. Summary Summary • Secondary storage, mainly disks • I/O times • I/Os should be avoided, especially random ones…..

More Related