Today

Today • Review of Directory of Slot Block Organizations • Heap Files • Program 1 Hints • Ordered Files & Hash Files • RAID

Directory of Slots Example

Heap Files • Heap files are stored as unordered records • the use of “heap” here is unrelated to the “free store” used for dynamic memory allocation • The simplest organization

Jones, Jim Favor, Sue Mach, Chris Rodgers, Bill 983 616 123 401 Heap File Example Name ID Name ID Name ID Smith, Mary 183 Ming, Yao 709 14 933 Alm, Louis Turing, Alan 93 Link, Steve 281 Patch, Linda Paradise, Sal 231 Yost, Ned 819 block 1 of file block 2 of file block N of file assuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of: 2D+C 2D+C N(D+RC) N(D+RC)/2 N(D+RC) inserting a record, deleting record given RID (ignore reclaiming space) scan, search for “key” w/ equality selection, search w/ range selection?

Prog 1 Hints • check slot numbers to make sure they are valid • sizeof(), memcpy(), memmove() • man is your friend • test patterns – make sure you handle error cases correctly • error reporting

class HFPage { struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId); short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[] short type; // an arbitrary value used by subclasses as needed PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page slot_t slot[1]; // first element of slot array. char data[MAX_SPACE - DPFIXED]; // methods ...

// ********************************************************** // page class constructor void HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE; slotCnt = 0; // no slots in use curPage = pageNo; usedPtr = sizeof(data); // offset of used space in data array freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot) }

init() • getNextPage(), setNextPage() • getPrevPage(), setPrevPage() • insertRecord(), deleteRecord() • firstRecord(), nextRecord() • getRecord(), returnRecord() • available_space() • empty()

int HFPage::available_space(void) { // look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record. int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot. return freeSpace - sizeof(slot_t); }

Ordered Files • Also called a sequential file. • File records are kept sorted by the values of an orderingfield. • Insertion is expensive: records must be inserted in the correct order. • It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. • A binary search can be used to search for a record on its ordering field value. • This requires reading and searching log2 of the file blocks on the average, an improvement over linear search. • Reading the records in order of the ordering field is quite efficient.

File of Ordered Records

Hashed Files • Hashing for disk files is called External Hashing • The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ..., bucketM-1. • Typically, a bucket corresponds to one (or a fixed number of) disk block. • One of the file fields is designated to be the hash key of the file. • The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. • Search is very efficient on the hash key. • Collisions occur when a new record hashes to a bucket that is already full. • An overflow file is kept for storing such records. • Overflow records that hash to each bucket can be linked together.

Hashed Files (contd.)

Hashed Files (contd.) • There are numerous methods for collision resolution, including the following: • Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. • Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. • Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

Hashed Files (contd.) • To reduce overflow records, a hash file is typically kept 70-80% full. • The hash function h should distribute the records uniformly among the buckets • Otherwise, search time will be increased because many overflow records will exist. • Main disadvantages of static external hashing: • Fixed number of buckets M is a problem if the number of records in the file grows or shrinks. • Ordered access on the hash key is quite inefficient (requires sorting the records).

Hashed Files - Overflow handling

Fill in This Table scan equality range insert delete search search Heap Sorted Hashed

Today

Today

Presentation Transcript

Today

Today…..

Today

Today

Today

Today

TODAY:

TODAY! TODAY! TODAY!

Today

Today

TODAY

Today

Today

Today

Today

TODAY

Today…

Today

TODAY

Today:

Today

Today