CPSC 461 Final Review I

CPSC 461Final Review I Hessam Zakerzadeh Dina Said

9.1) What is the most important difference between a disk and a tape?

9.1) What is the most important difference between a disk and a tape? Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order. Disks support direct access to a desired page.

Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

Linear Hashing • No directory • More flexibility wrt time for bucket splits • Worse performance than Extendible Hashing if data is skewed. • Utilizes a family of Hash function h0,h1,… such that hi(v)=h(v) mod 2iN • N is the initial number of buckets • If N is power of 2d0, then apply h and look at the last di bits → di=d0+1

Inserting a Data Entry in LH • Find bucket by applying hLevel/ hLevel+1: • If the bucket to insert into is full: • Add overflow page and insert data entry. • (Maybe) split Nextbucket and increment Next. • Else simply insert the data entry into the bucket.

Bucket Split • A split can be triggered by • the addition of a new overflow page • conditions such as space utilization • Whenever a split is triggered, • the Nextbucket is split, • and hash function hLevel+1 redistributes entries between this bucket (say bucket number b) and its split image; • the split image is therefore bucket number b+NLevel. • Next Next + 1.

Example: Insert 44 (11100), 9 (01001) Level=0, Next=0, N=4 h h 0 1 Next=0 32* 44* 36* 000 00 9* 5* 001 25* 01 30* 10* 14* 18* 10 010 31* 35* 7* 11* 011 11 PRIMARY (This info is for illustration only!) PAGES

Example: Insert 43 (101011) Level=0, N=4 h h Next=0 0 1 32* 44* 36* 000 00 Level=0 Next=1 ç 9* 5* 001 25* 01 h OVERFLOW h PRIMARY 30* 10* 14* 18* 10 010 0 1 PAGES PAGES 32* 31* 35* 7* 11* 000 00 011 11 9* 5* 25* 001 01 PRIMARY (This info is for illustration only!) PAGES 30* 10* 14* 18* 10 010 (This info is for illustration only!) 31* 35* 7* 11* 43* 011 11 100 44* 36* 00

Example: End of a Round Level=1, Next = 0 Insert 50 (110010) PRIMARY OVERFLOW h h PAGES 0 1 PAGES Next=0 Level=0, Next = 3 00 000 32* PRIMARY OVERFLOW PAGES h PAGES h 1 0 001 01 9* 25* 32* 000 00 10 010 50* 10* 18* 66* 34* 9* 25* 001 01 011 11 35* 11* 43* 66* 10 18* 10* 34* 010 Next=3 100 00 44* 36* 43* 11* 7* 31* 35* 011 11 101 11 5* 29* 37* 44* 36* 100 00 14* 22* 30* 110 10 5* 37* 29* 101 01 14* 30* 22* 31* 7* 11 111 110 10

Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other. • A hash function is expected to distribute the search key values uniformly in all the buckets • A split can be triggered by Conditions such as space utilization →length of the overflow chain reduces. • Therefore, number of overflow pages isn't expect to be more than 1

Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?

Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value? No. Overflow chains are part of the structure, so no such guarantees are provided

Exercise 11.4 Answer the following questions about Linear Hashing: If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst-case cost for an equality search? Under what conditions would this cost be the actual search cost?

Maximum Number of records in each page = 0.8 * P If all keys map to the same bucket We will have (N / 0.8P) pages in that bucket. This is the worst time

Exercise 11.4 Answer the following questions about Linear Hashing: If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?

Space utilization = Total Number of buckets / Total Number of pages If data is skewed: All records are mapped to the same bucket Suppose that we have m main pages All records will be mapped to bucket 0 Each additional overflow will cause split Suppose we added n overflow pages to bucket 0 → we added n buckets Total Number of buckets = n+1 Total Number of pages = m + n +n Space Utilization = (n+1) / (m+2n) < 50% → Very bad

13.4

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K For Pass 0: Ceil(10*10^6 / 320)= 31250 Runs Read Cost per Run = (10+5 + 1*320) Write Cost per Run = (10+5 + 1*320) Total I/O cost = No of Runs * (Cost of read + Cost of Write) = 31250 * 2* (15+320) → Cost of Pass 0

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = ceil (lognoOfWay31250) = ceil ( ln 31250 / ln No. of ways) Read/Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) No. of blocks= Ceil (10*10^6 / No. of pages per block)

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = = ceil ( ln 31250 / ln No. of ways) = ceil ( ln 31250 / ln 256) = 2 Read Cost: = 16 *10^7

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) = 156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = 156250

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) = 2* (16*10^7 + 156250 * (15+64))

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pagee) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = ceil ( ln 31250 / ln No. of ways) =8 Read/Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) = 156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = 156250 Total Cost=8 * (2 * 156250 * (15+64))

CPSC 461 Final Review I