340 likes | 456 Views
Exam I Grades. Max: 96, Min: 37 Mean/Median:66, Std: 18 Distribution: >= 90 : 6 >= 80 : 12 >= 70 : 9 >= 60 : 9 >= 50 : 7 >= 40 : 11 >= 30 : 5 Letter Grades: >=85: A >= 60: B >= 40: C < 40: D. Buffer Management Record Formats, Files & Indexing. Query Optimization and Execution.
E N D
Exam I Grades • Max: 96, Min: 37 • Mean/Median:66, Std: 18 • Distribution: • >= 90 : 6 • >= 80 : 12 • >= 70 : 9 • >= 60 : 9 • >= 50 : 7 • >= 40 : 11 • >= 30 : 5 • Letter Grades: • >=85: A • >= 60: B • >= 40: C • < 40: D
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Disks, Memory, and Files The BIG picture…
Focus on: “Typical Disk” BUS Disk Controller … Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap
Often different numbers of sectors per track Top View Sector Track Block (typically multiple sectors) Gap
Key Performance Metric:Time to Fetch Block block x in memory I want block X ? Time = Seek Time (locate track) + Rotational Delay (locate sector)+ Transfer Time (fetch block) + Other (disk controller, …)
Arranging Pages on Disk • Blocks are multiples of sector size • Nextblock concept: • blocks on same track, followed by • blocks on same cylinder, followed by • blocks on adjacent cylinder • Blocks in a file should be arranged sequentially on disk (by ‘next’), to minimize seek and rotational delay. • For a sequential scan, pre-fetchingseveral pages at a time is a big win!
Disk Space Management • Lowest layer of DBMS software manages space on disk. • Higher levels call upon this layer to: • allocate/de-allocate a page • read/write a page • Request for a sequence of pages must be satisfied by allocating the pages sequentially on disk! Higher levels don’t need to know how this is done, or how free space is managed.
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Context
Storage Overview Data Items Records Blocks/Pages Files Memory Disk
Reducing Number of Page Transfers • Keep cache of recently accessed pages in main memory • Goal: request for page can be satisfied from cache/buffer pool instead of disk • Purge pages when buffer pool is full • E.g., Use LRU algorithm • Record clean/dirty state of page
Accessing Data Through Buffer DBMS Buffer pool Application page frame
DB Buffer Management in a DBMS Page Requests from Higher Levels BUFFER POOL • Data must be in RAM for DBMS to operate on it! • Table of <frame#, pageid> pairs is maintained. disk page free frame MAIN MEMORY DISK choice of frame dictated by replacement policy
When a Page is Requested ... • If requested page is not in pool: • Choose a frame for replacement • If frame is dirty, write it to disk • Read requested page into chosen frame • Pin the page and return its address. • If requests can be predicted (e.g., sequential scans) • pages can be pre-fetchedseveral pages at a time!
More on Buffer Management • Requestor of page must unpin it, and indicate whether page has been modified: • dirtybit is used for this. • Page in pool may be requested many times, • a pin count is used. A page is a candidate for replacement iff pin count = 0.
Buffer Replacement Policy • Frame is chosen for replacement by a replacement policy: • Least-recently-used (LRU), • Clock, • MRU, etc. • Policy can have big impact on # of I/O’s; depends on the access pattern.
LRU Replacement Policy • Least Recently Used (LRU) • for each page in buffer pool, keep track of time when last unpinned • replace the frame which has the oldest (earliest) time • very common policy: intuitive and simple • Works well for repeated accesses to popular pages • Problems? • Problem: Sequential flooding • LRU + repeated sequential scans. • # buffer frames < # pages in file means each page request causes an I/O. • Idea: MRU better in this scenario?
A(1) B(p) D(1) C(1) “Clock” Replacement Policy • An approximation of LRU • Arrange frames into a cycle, store one reference bit per frame • Can think of this as the 2nd chance bit • When pin count reduces to 0, turn on ref. bit • When replacement necessarydo for each page in cycle { if (pincount == 0 && ref bit is on) turn off ref bit; else if (pincount == 0 && ref bit is off) choose this page for replacement; } until a page is chosen;
Physical vs. Logical Addresses Device ID E.g., Record Cylinder # Address = Track # or ID Block # Offset in block Block ID
Logical Addresses E.g., Record ID is arbitrary bit string map rec ID r address a Rec ID Physical addr.
block 2 RecA Swizzling Memory Disk block 1 block 1 block 2 RecA
One Option: Translation DB Addr Mem Addr Table Rec-A Rec-A-inMem
Another Option: In memory pointers - need “type” bit to disk to memory M
Swizzling • Automatic • On-demand • No swizzling / program control
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Context
Disk & Buffer Manager • Disks provide cheap, non-volatile storage. • Random access, but cost depends on location of page on disk; important to arrange data sequentially to minimize seek and rotation delays. • Buffer manager brings pages into RAM. • Page stays in RAM until released by requestor. • Written to disk when frame chosen for replacement (which is sometime after requestor releases the page). • Choice of frame to replace based on replacement policy. • Tries to pre-fetch several pages at a time.
Buffer Management and Files • Storage of Data • Fields, either fixed or variable length... • Stored in Records... • Stored in Pages... • Stored in Files • If data won’t fit in RAM, store on Disk • Need Buffer Pool to hold pages in RAM • Different strategies decide what to keep in pool
Record Formats: Fixed Length • Information about field types same for all records in a file; stored in systemcatalogs. • Finding i’th field does not require scan of record. F1 F2 F3 F4 L1 L2 L3 L4 Header Base address (B) Address = B+L1+L2
4 $ $ $ $ Record Formats: Variable Length • Two alternative formats (# fields is fixed): F1 F2 F3 F4 Fields Delimited by Special Symbols Field Count F1 F2 F3 F4 Header: Array of Field Offsets • Second offers direct access to i’th field, efficient storage • of nulls(special don’t know value); small directory overhead.
Page Formats: Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 • Record id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable. Free Space . . . . . . Slot N Slot N Slot M N . . . 1 1 1 M 0 M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP
Page Formats: Variable Length Records Rid = (i,N) Page i • Can move records on page without changing rid; so, attractive for fixed-length records too. Rid = (i,2) Rid = (i,1) N Pointer to start of free space 20 16 24 N . . . 2 1 # slots SLOT DIRECTORY
Spanned vs. Unspanned Records • Unspanned: records must be within one block • block 1 block 2 • ... • Spanned • block 1 block 2 ... R1 R2 R3 R4 R5 R1 R2 R3 (a) R3 (b) R4 R5 R6 R7 (a)
Block header - data at beginning that describes block May contain: - File ID (or RELATION or DB ID) - This block ID - Record directory - Pointer to free space - Type of block (e.g. contains recs type 4; is overflow, …) - Pointer to other blocks “like it” - Timestamp ...