FINAL REPORT CS257

FINAL REPORT CS257 Dr. T Y Lin PAYAL GUPTA Class ID:106(now 225) Student ID:006495593 Modifications are in red and italics

Secondary storage management Sections 13.1 – 13.3

13.1.1 Memory Hierarchy • Data storage capacities varies for different data • Cost per byte to store data also varies • Device with smallest capacity offer the fastest speed with highest cost per bit

Memory Hierarchy Diagram Programs, DBMS Main Memory DBMS’s Tertiary Storage As Visual Memory Disk File System Main Memory Cache

13.1.1 Memory Hierarchy • Cache • Lowest level of the hierarchy • Data items are copies of certain locations of main memory • Sometimes, values in cache are changed and corresponding changes to main memory are delayed • Machine looks for instructions as well as data for those instructions in the cache • Holds limited amount of data • No need to update the data in main memory immediately in a single processor computer • In multiple processors data is updated immediately to main memory….called as write through

Main Memory • Refers to physical memory that is internal to the computer. The word main is used to distinguish it from external mass storage devices such as disk drives. • Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory • Main memories are random access….one can obtain any byte in the same amount of time

Secondary storage • Used to store data and programs when they are not being processed • More permanent than main memory, as data and programs are retained when the power is turned off • A personal computer might only require 20,000 bytes of secondary storage • E.g. magnetic disks, hard disks

Tertiary Storage • consists of anywhere from one to several storage drives. • It is a comprehensive computer storage system that is usually very slow, so it is usually used to archive data that is not accessed frequently. • Holds data volumes in terabytes • Used for databases much larger than what can be stored on disk

13.1.2 Transfer of Data Between levels • Data moves between adjacent levels of the hierarchy • At the secondary or tertiary levels accessing the desired data or finding the desired place to store the data takes a lot of time • Disk is organized into bocks • Entire blocks are moved to and from memory called a buffer • A key technique for speeding up database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time • Same idea applies to other hierarchy levels

13.1.3 Volatile and Non Volatile Storage • A volatile device forgets what data is stored on it after power off • Non volatile holds data for longer period even when device is turned off • Secondary and tertiary devices are non volatile • Main memory is volatile

13.1.4 Virtual Memory • computer system technique which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage • technique make programming of large applications easier and use real physical memory (e.g. RAM) more efficiently • Typical software executes in virtual memory • Address space is typically 32 bit or 232 bytes or 4GB • Transfer between memory and disk is in terms of blocks

13.2.1 Mechanism of Disk • Mechanisms of Disks • Use of secondary storage is one of the important characteristic of DBMS • Consists of 2 moving pieces of a disk • 1. disk assembly • 2. head assembly • Disk assembly consists of 1 or more platters • Platters rotate around a central spindle • Bits are stored on upper and lower surfaces of platters

13.2.1 Mechanism of Disk • Disk is organized into tracks • The track that are at fixed radius from center form one cylinder • Tracks are organized into sectors • Tracks are the segments of circle separated by gap

13.2.2 Disk Controller • One or more disks are controlled by disk controllers • Disks controllers are capable of • Controlling the mechanical actuator that moves the head assembly • Selecting the sector from among all those in the cylinder at which heads are positioned • Transferring bits between desired sector and main memory • Possible buffering an entire track

13.2.3 Disk Access Characteristics • Accessing (reading/writing) a block requires 3 steps • Disk controller positions the head assembly at the cylinder containing the track on which the block is located. It is a ‘seek time’ • The disk controller waits while the first sector of the block moves under the head. This is a ‘rotational latency’ • All the sectors and the gaps between them pass the head, while disk controller reads or writes data in these sectors. This is a ‘transfer time’

13.3 Accelerating Access to Secondary Storage • Secondary storage definition • Several approaches for more-efficiently accessing data in secondary storage: • Place blocks that are together in the same cylinder. • Divide the data among multiple disks. • Mirror disks. • Use disk-scheduling algorithms. • Prefetch blocks into main memory. • Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm. • Throughput – the number of disk accesses per second that the system can accommodate.

13.3.1 The I/O Model of Computation • The number of block accesses (Disk I/O’s) is a good time approximation for the algorithm. • Disk I/o’s proportional to time taken, so should be minimized. • Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides. • For Megatron 747 (M747) example, it takes 11ms to read a 16k block. • delay in searching for the desired tuple is negligible.

13.3.2 Organizing Data by Cylinders • first seek time and first rotational latency can never be neglected. • Ex 13.4: We request 1024 blocks of M747. • If data is randomly distributed, average latency is 10.76ms by Ex 13.2, making total latency 11s. • If all blocks are consecutively stored on 1 cylinder: • 6.46ms + 8.33ms * 16 = 139ms (1 average seek) (time per rotation) (# rotations)

13.3.3 Using Multiple Disks • Number of disks is proportional to the factor by which performance is performance will increase by improved • Striping – distributing a relation across multiple disks following this pattern: • Data on disk R1: R1, R1+n, R1+2n,… • Data on disk R2: R2, R2+n, R2+2n,… … • Data on disk Rn: Rn, Rn+n, Rn+2n, … • Ex 13.5: We request 1024 blocks with n = 4. • 6.46ms + (8.33ms * (16/4)) = 39.8ms (1 average seek) (time per rotation) (# rotations)

13.3.4 Mirroring Disks • Mirroring Disks – having 2 or more disks hold identical copy of data. • Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks. • Benefit 2: If we have n disks, read performance increases by a factor of n. • Performance increases =>increasing efficiency

13.3.5 Disk Scheduling and the Elevator Problem • Disk controller will run this algorithm to select which of several requests to process first. • Pseudo code: • requests[] // array of all non-processed data requests • upon receiving new data request: • requests[].add(new request) • while(requests[] is not empty) • move head to next location • if(head is at data in requests[]) • retrieves data • removes data from requests[] • if(head reaches end) • reverses head direction

13.3.5 Disk Scheduling and the Elevator Problem (con’t) Events: Head starting point Request data at 8000 Request data at 24000 Request data at 56000 Get data at 8000 Request data at 16000 Get data at 24000 Request data at 64000 Get data at 56000 Request Data at 40000 Get data at 64000 Get data at 40000 Get data at 16000 64000 56000 48000 40000 32000 24000 16000 8000

13.3.5 Disk Scheduling and the Elevator Problem (con’t) Elevator Algorithm FIFO Algorithm

13.3.6 Prefetching and Large-Scale Buffering • If at the application level, we can predict the order blocks will be requested, we can load them into main memory before they are needed. • This even reduces the cost and even save the time

13.4.Disk Failures • IntermittentError: Read or write is unsuccessful. If we try to read the sector but the correct content of that sector is not delivered to the disk controller. Check for the good or bad sector. To check write is correct: Read is performed. Good sector and bad sector is known by the read operation. • Checksums: Each sector has some additional bits, called the checksums. They are set on the depending on the values of the data bits stored in that sector. Probability of reading bad sector is less if we use checksums. For Odd parity: Odd number of 1’s, add a parity bit 1. For Even parity: Even number of 1’s, add a parity bit 0. So, number of 1’s becomes always even.

Example: 1. Sequence : 01101000-> odd no of 1’s parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s parity bit: 0 -> 111011100 • Stable -Storage Writing Policy: To recover the disk failure known as Media Decay, in which if we overwrite a file, the new data is not read correctly. Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned.

The term used for these strategies is RAID or Redundant Arrays of Independent Disks. • Mirroring: Mirroring Scheme is referred as RAID level 1 protection against data loss scheme. In this scheme we mirror each disk. One of the disk is called as data disk and other redundant disk. In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired. • Parity Blocks: RAID level 4 scheme uses only one redundant disk no matter how many data disks there are. In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks. It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.

Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small • Write Failure: During power outage, • 1. While writing Xl, the Xr, will remain good and X can be read from Xr • 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X. Recovery from Disk Crashes: • To reduce the data loss by Dish crashes, schemes which involve redundancy, extending the idea of parity checks or duplicate sectors can be applied.

Parity Block – Writing When we write a new block of a data disk, we need to change that block of the redundant disk as well. • One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os • RAID 5 RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. • Error-correcting codes theory known as Hamming code leads to the RAID level 6. • By this strategy the two simultaneous crashes are correctable. The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3. The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4. The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4 Coping With Multiple Disk Crashes – Reading/Writing • We may read data from any data disk normally. • To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1.

Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

13.5 Arranging data on disk • Data elements are represented as records, which stores in consecutive bytes in same same disk block. Basic layout techniques of storing data : Fixed-Length Records Allocation criteria - data should start at word boundary. Fixed Length record header 1. A pointer to record schema. 2. The length of the record. 3. Timestamps to indicate last modified or last read.

Example CREATE TABLE employee( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1), birthdate DATE ); Data should start at word boundary and contain header and four fields name, address, gender and birthdate.

Packing Fixed-Length Records into Blocks Records are stored in the form of blocks on the disk and they move into main memory when we need to update or access them. A block header is written first, and it is followed by series of blocks. Block header contains the following information: Links to one or more blocks that are part of a network of blocks. Information about the role played by this block in such a network. Information about the relation, the tuples in this block belong to.

A "directory" giving the offset of each record in the block. Time stamp(s) to indicate time of the block's last modification and/or access Along with the header we can pack as many record as we can Along with the header we can pack as many record as we can in one block as shown in the figure and remaining space will be unused.

13.6 Representing Block and Record Addresses • Address of a block and Record • In Main Memory • Address of the block is the virtual memory address of the first byte • Address of the record within the block is the virtual memory address of the first byte of the record • In Secondary Memory: sequence of bytes describe the location of the block in the overall system • Sequence of Bytes describe the location of the block : the device Id for the disk, Cylinder number, etc.

Addresses in Client-Server Systems • The addresses in address space are represented in two ways • Physical Addresses: byte strings that determine the place within the secondary storage system where the record can be found. • Logical Addresses: arbitrary string of bytes of some fixed length • Physical Address bits are used to indicate: • Host to which the storage is attached • Identifier for the disk • Number of the cylinder • Number of the track • Offset of the beginning of the record

Map Table relates logical addresses to physical addresses. Logical Address Physical Address

Logical and Structured Addresses Purpose of logical address? Gives more flexibility, when we • Move the record around within the block • Move the record to another block Gives us an option of deciding what to do when a record is deleted? • Pointer Swizzling Having pointers is common in an object-relational database systems Important to learn about the management of pointers Every data item (block, record, etc.) has two addresses: • database address: address on the disk • memory address, if the item is in virtual memory

Translation Table: Maps database address to memory address • All addressable items in the database have entries in the map table, while only those items currently in memory are mentioned in the translation table Database address Memory Address

Pointer consists of the following two fields • Bit indicating the type of address • Database or memory address • Example 13.17 Memory Disk Swizzled Block 1 Block 1 Unswizzled Block 2

Example 13.7 Block 1 has a record with pointers to a second record on the same block and to a record on another block If Block 1 is copied to the memory • The first pointer which points within Block 1 can be swizzled so it points directly to the memory address of the target record • Since Block 2 is not in memory, we cannot swizzle the second pointer • Three types of swizzling • Automatic Swizzling • As soon as block is brought into memory, swizzle all relevant pointers.

Swizzling on Demand • Only swizzle a pointer if and when it is actually followed. • No Swizzling • Pointers are not swizzled they are accesses using the database address. • Unswizzling • When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses • Use translation table again • Important to have an efficient data structure for the translation table

Pinned records and Blocks • A block in memory is said to be pinned if it cannot be written back to disk safely. • If block B1 has swizzled pointer to an item in block B2, then B2 is pinned • Unpin a block, we must unswizzle any pointers to it • Keep in the translation table the places in memory holding swizzled pointers to that item • Unswizzle those pointers (use translation table to replace the memory addresses with database (disk) addresses

13.7 Records With Variable-Length Fields A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header: 1. The length of the record. 2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields.

Records With Repeating Fields • A similar situation occurs if a record contains a variable number of Occurrences of a field F, but the field itself is of fixed length. It is sufficient to group all occurrences of field F together and put in the record header a pointer to the first. • We can locate all the occurrences of the field F as follows. Let the number of bytes devoted to one instance of field F be L. We then add to the offset for the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on. • Eventually, we reach the offset of the field following F. Where upon we stop.

An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep: • 1. Pointers to the place where each repeating field begins, and • 2. Either how many repetitions there are, or where the repetitions end.

Variable-Format Records • The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of: 1. Information about the role of this field, such as: (a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and some readily available schema information, and (c) The length of the field, if it is not apparent from the type. 2. The value of the field. There are at least two reasons why tagged fields would make sense.

Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know. 2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them

These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks. • Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space. For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment. If records can be spanned, then every record and record fragment requires some extra header information:

FINAL REPORT CS257

FINAL REPORT CS257

Presentation Transcript

Final Report

Final Report

CS257 Presentations

Final Report

Final Report

Final Report

Final report

Final Report

Final Report

Final Report

FINAL REPORT

Final Report:

Final Report

FINAL REPORT

Final Report

REPORT CS257

Final Report

Final Report

Final Report

CS257 Query Optimization

Cs257 Summary

Final Report