600 likes | 875 Views
CS 728 Advanced Database Systems Chapter 16. Database File Organization: Unordered, Ordered, and Hashed Files of Records. Disk I/O Model of Computation. Disk I/O is equivalent to one read or write operation of a single block
E N D
CS 728Advanced Database SystemsChapter 16 Database File Organization: Unordered, Ordered, and Hashed Files of Records
Disk I/O Model of Computation • Disk I/O is equivalent to one read or write operation of a single block • It is very expensive compared with what is likely to be done once the block gets in main memory • one random disk I/O ~ about 1,000,000 machine instructions in terms of time • Cost for computation that requires secondary storage is computed only by disk I/Os.
Disk Storage Devices • Preferred secondary storage device for high storage capacity and low cost. • Data stored as magnetized areas on magnetic disk surfaces. • A diskpack contains several magnetic disks connected to a rotating spindle. • Disks are divided into concentric circular tracks on each disk surface. • Track capacities vary typically from 4 to 50 Kbytes or more
Disk Storage Devices (cont.) • A track is divided into smaller blocks or sectors • because it usually contains a large amount of information • The division of a track into sectors is hard-coded on the disk surface and cannot be changed. • One type of sector organization calls a portion of a track that subtends a fixed angle at the center as a sector. • A track is divided into blocks. • The block size B is fixed for each system. • Typical block sizes range from B=512 bytes to B=4096 bytes. • Whole blocks are transferred between disk and main memory for processing.
Disk Storage Devices (cont.) • A read-write head moves to the track that contains the block to be transferred. • Disk rotation moves the block under the read-write head for reading or writing. • A physical disk block (hardware) address consists of: • a cylinder number (imaginary collection of tracks of same radius from all recorded surfaces) • the track number or surface number (within the cylinder) • and block number (within track). • Reading or writing a disk block is time consuming because of the seek time s and rotational delay (latency) rd. • Double buffering can be used to speed up the transfer of contiguous disk blocks.
Pages and Blocks • Data files decomposed into pages (blocks) • fixed size piece of contiguous information in the file • sizes range from 512 bytes to several kilobytes • block is the smallest unit for transferring data between the main memory and the disk. • Address of a page (block): • (cylinder#, track# (within cylinder), sector# (within track)
Pages and Blocks Track Sector Gap ... 1 3 4 2 One track 1 page/block = 4 Sectors
Page I/O • Page I/O --- one page I/O is the cost (or time needed) to transfer one page of data between the memory and the disk. • The cost of a (random) page I/O = • seek time + rotational delay + block transfer time • Seek time • time needed to position read/write head on correct track. • Rotational delay (latency) • time needed to rotate the beginning of page under read/write head. • Block transfer time • time needed to transfer data in the page/block.
Page I/O • Average rotational delay (rd) • rd = ½ * (1/p) min = (60*1000)/(2*p) msec • OR • rd = ½ * cost of 1 revolution • = ½ * (60*1000/p) msec • where • p is speed of disk rotation (how many revolutions per minute - rpm) • Example • Speed of disk rotatioon is p = 3600 rpm • 60 revolutions/sec • 1 rev. = 16.66 msec. (1 second = 1000 msec) • rd = 8.33 ms
Page I/O • Transfer rate (tr) • tr = track size / cost of one revolution • = track size / (60*1000/p) in msec • Bulk transfer rate (btr) • btr = (B/(B+G)) * tr bytes/msec • Where B is the block size in bytes • G is interblock gap size in bytes • Block transfer time (btt) • btt = B / tr not taking into acount G • btt = B / btr taking into acount G
Page I/O • Example: • Track size = 50 KB and p = 3600 rpm • Block size B = 3KB = 3000 bytes • tr = (50*1000)/(60*1000/3600) = 3000 bytes/msec • btt = B / tr = 3000/3000 = 1 msec
Page I/O • Average time for reading/writing nconsecutive pages that are in the same track or cylinder = s + rd + n* btt • Average time for reading/writing consecutivelynnoncontigues pages/blocks that are in the same cylinder = s + n* (rd + btt)
An Example • A hard disk specifications: • 4 platters, 8 Surfaces, 3.5 Inch diameter • 213 = 8192 tracks/surface • 28 = 256 sectors/track • 29 = 512 bytes/sector • Average seek time s = 25 ms • Rotation rate rd = 3600 rpm = 60 rps • 1 rev. = 16.66 msec • Transfer rate • tr = 1 KB in 0.117 ms • tr = 1 KB in 0.130 ms with gap
An Example • What is the total capacity of this disk • 8 GB (8*213*28*29=233) • How many bytes does one track hold? • 256 sectors/track*512 bytes/sector = 128KB • How many blocks per track? • one block = 4096 bytes = 8 sectors (4096/512) • 256/8 = 32 blocks/track
An Example • How long does it take to access one block? • One block = 4096 bytes • 8 sectors = 4096/512 • Rotation rate r • 1 rev. = 16.66 msec. • Time to access 1 sector (s + r/2 + tr/(secters/KB) • 25 + (16.66/2) + .117/2 = 33.3885 ms. • time to access 1 block • time to access the first sector of the block + time to access the subsequent 7 sectors.
An Example 1 ... 8 3 2 1 block = 8 Sectors • T = 25 + (16.66/2) + (0.117/2) * 1 + (0.13/2) *7 = 33.3885 + 0.455 ms = 33.8435ms • Compare to one sector access time: 33.3885 ms
Buffering • A buffer • is a contiguous reserved area in main memory available for storage of copies of disk blocks. • to speed up the processes. • For a read command • the block from disk is copied into the buffer. • For a write command • the contents of the buffer are copied into the disk.
Accessing Data Through RAM Buffer RAM Block transfer Buffer Application block Record transfer Page frames
File Organization • The database is stored as a collection of files. • Each file is a sequence of records. • A record is a sequence of fields. • Records are stored on disk blocks. • A file can have fixed-length records or variable-length records.
File Organization • Fixed length records • Each record is of fixed length. Pad with spaces. • Variable length records • different records in the file have different sizes. • Arise in database systems in several ways: • different record types in a file. • same record type with (variable-length fields, repeating field, or optional fields)
Fixed-Length Records • Insertion: • Store record i starting from byte n (i – 1), where n is the size of each record. • Deletion of record i: • Packed format: • move records i + 1, . . ., n to i, . . . , n – 1 • OR • move record n to i • Unpacked format (do not move records, but) • link all free records on a free list • OR • Use bitmap vector
Free Lists • Store the address of the first deleted record in the file header. • Use this first record to store the address of the second deleted record, and so on.
Page Formats: Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 Free Space . . . . . . Slot N Slot N Slot M N . . . 1 1 1 M 0 M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP • Record id = <page id, slot #>.
Variable-Length Records Representation • Byte-String representation • Attach an end-of-record () control character to the end of each record • Difficulty with deletion and growth • Slotted-page header contains: • number of record entries • location and size of each record • end of free space in the block
Slotted Page Structure • Records can be moved around within a page to keep them contiguous with no empty space between them • entry in the header must be updated. • Pointers should not point directly to record - instead they should point to the entry for the record in header.
Fixed-Length Representation • Reserved Space • can use fixed-length records of a known maximum length • unused space in shorter records filled with a null or end-of-record symbol.
Fixed-Length Representation • List Representation by Pointers • A variable-length record is represented by a list of fixed-length records, chained together via pointers. • Can be used even if the maximum record length is not known
Fixed-Length Representation • Disadvantage: space is wasted in all records except the first in a a chain. • Solution is to allow two kinds of block in file: • Anchor block: contains the first records of chain • Overflow block: contains records other than those that are the first records of chairs.
Blocking Factor • Blocking Factor (bfr) - the number of records that can fit into a single block. • bfr = ⌊B/R⌋ • B : Block size in bytes • R: Record size in bytes • Example: • Record size R = 100 bytes • Block Size B = 2,000 bytes • Thus the blocking factor bfr = 2000/100 = 20 • The number of blocks b needed to store a file of rrecords: • b = r/bfr blocks
Spanned & Unspanned Records • A block is the unit of data transfer between disk and memory. • Unspanned records: • A record is found in one and only one block. • records do not span across block boundaries. • Used with fixed-length records having BR • Spanned records: • Records are allowed to span across block boundaries. • Used with variable-length records having RB • In variable-length records, either organization can be used.
Placing File Records on Disk • A file header or file descriptor contains information about a file (e.g., the disk address, record format descriptions, etc.)
Allocating File Blocks on Disk • The physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or indexed. • In contiguous allocation, the file blocks are allocated to consecutive disk blocks. • In linked allocation, each file block contains a pointer to the next file block. • In indexed allocation, one or more index blocks contain pointers to the actual file blocks.
Organization of Records in Files • Heap/Unordered/Pile File Organization • a record can be placed anywhere in the file where there is space, or at the end • for full file scans or frequent updates • Data unordered (unsorted) • Sorted/OrderedFile Organization • store records sorted in order, based on the value of the search key of each record • Need external sort or an index to keep sorted • HashingFile Organization • a hash function computed on some attribute of each record • the result specifies in which block of the file the record should be placed
Heap File Organization • Records are placed in the file in the order in which they are inserted. Such an organization is called a heap file. • Insertion is at the end • takes constant timeO(1) (very efficient) • Searching • requires a linear search (expensive) • Deleting • requires a search, then delete • Select, Update and Delete • take b/2 time (linear time) in average • b is the number of blocks
Heap File Organization • For a file of unorderedfixed-length records using unspanned blocks and contiguous allocation, it is straightforward to access any record by its position in the file. • If the records are numbered 0,1,2, …, r-1 and • The records in each block are numbered 0,1,2, …, f-1, where f is the blocking factor • The the i-th record of the file is located in • Blocki/f and in the • (i mod f)-th record in that block
Heap File Organization • A Heap file allows us to retrieve records: • by specifying the rid, or • by scanning all records sequentially • Accessing a record by its position does not help locate a record based on a search condition.
File Stored as a Heap File 666666 MGT123 F1994 4.0 123456 CS305 S1996 4.0 page 0 987654 CS305 F1995 2.0 717171 CS315 S1997 4.0 666666 EE101 S1998 3.0 page 1 765432 MAT123 S1996 2.0 515151 EE101 F1995 3.0 234567 CS305 S1999 4.0 page 2 878787 MGT123 S1996 3.0
Sequential File Organization • Suitable for applications that require sequential processing of the entire file • The records in the file are ordered by a search-key
Files of Ordered Records • Some blocks of an ordered (sequential) file of EMPLOYEE records with NAME as the ordering key field.
File Stored as a Sorted File 111111 MGT123 F1994 4.0 111111 CS305 S1996 4.0 page 0 123456 CS305 F1995 2.0 123456 CS315 S1997 4.0 123456 EE101 S1998 3.0 page 1 232323 MAT123 S1996 2.0 234567 EE101 F1995 3.0 234567 CS305 S1999 4.0 page 2 313131 MGT123 S1996 3.0
Sequential File Organization • Insertion is expensive • records must be inserted in the correct order • locate the position where the record is to be inserted • if there is free space insert there • if no free space insert the record in an overflow block • In either case, pointer chain must be updated • Insert takes lg(b) plus the time to re-organize records. • b is the number of blocks • Deletion • use pointer chains • Searching • very efficient (Binary search) • This requires lg(b) on the average
Average Access Times • The following table shows the average access time to access a specific record for a given type of file
Hashed Files • Hashing for disk files is called External Hashing • The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ..., bucketM-1. • Typically, a bucket corresponds to one (or a fixed number of) disk block. • One of the file fields is designated to be the hash key of the file. • The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. • Search is very efficient on the hash key. • Collisions occur when a new record hashes to a bucket that is already full. • An overflow file is kept for storing such records. • Overflow records that hash to each bucket can be linked together.
Hashed Files (cont.) • There are numerous methods for collision resolution, including the following: • Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. • Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. • Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.