1 / 17

Section 13.3

Secondary storage management. Section 13.3. Eilbroun Benjamin CS 257 – Dr. TY Lin. Presentation Outline. 13.3 Accelerating Access to Secondary Storage 13.3.1 The I/O Model of Computation 13.3.2 Organizing Data by Cylinders 13.3.3 Using Multiple Disks 13.3.4 Mirroring Disks

Download Presentation

Section 13.3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Secondary storage management Section 13.3 Eilbroun Benjamin CS 257 – Dr. TY Lin

  2. Presentation Outline • 13.3 Accelerating Access to Secondary Storage • 13.3.1 The I/O Model of Computation • 13.3.2 Organizing Data by Cylinders • 13.3.3 Using Multiple Disks • 13.3.4 Mirroring Disks • 13.3.5 Disk Scheduling and the Elevator Algorithm • 13.3.6 Prefetching and Large-Scale Buffering

  3. 13.3 Accelerating Access to Secondary Storage • Several approaches for more-efficiently accessing data in secondary storage: • Place blocks that are together in the same cylinder. • Divide the data among multiple disks. • Mirror disks. • Use disk-scheduling algorithms. • Prefetch blocks into main memory. • Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm. • Throughput – the number of disk accesses per second that the system can accommodate.

  4. 13.3.1 The I/O Model of Computation • The number of block accesses (Disk I/O’s) is a good time approximation for the algorithm. • This should be minimized. • Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides. • For Megatron 747 (M747) example, it takes 11ms to read a 16k block. • A standard microprocessor can execute millions of instruction in 11ms, making any delay in searching for the desired tuple negligible.

  5. 13.3.2 Organizing Data by Cylinders • If we read all blocks on a single track or cylinder consecutively, then we can neglect all but first seek time and first rotational latency. • Ex 13.4: We request 1024 blocks of M747. • If data is randomly distributed, average latency is 10.76ms by Ex 13.2, making total latency 11s. • If all blocks are consecutively stored on 1 cylinder: • 6.46ms + 8.33ms * 16 = 139ms (1 average seek) (time per rotation) (# rotations)

  6. RAID Overview • Stands for Redundant Array of Inexpensive Disks. • RAID is a way of using multiple disk drives and controllers to increase read/write speed or redundancy. • Hardware RAID is controlled by special hardware. Operating system is unaware of special disk configuration. • Software RAID is controlled by operating system. Less expensive than Hardware RAID, but often slower. • Appears as a single volume to the operating system.

  7. RAID 0 Three disk RAID 0 Each stack is a separate disk. Each color a different file. • Disk Striping without Parity • Does not offer redundancy, does offer read/write improvement. • Different parts of the file are written to different disks at the same time. • This significantly improves write time. The more disks in the RAID 0 array, the faster it is. • Drawback is that if one disk in RAID 0 array fails, all array data is lost.

  8. RAID 1 RAID 1. 2nd disk is exact copy of first. • Disk Mirroring and Duplexing • Disk mirroring: two disks, one controller • Disk duplexing: two disks, two controllers • Duplexing is more fault tolerant than mirroring as failure of the controller in mirroring will mean loss of the volume. • All data on the first disk is mirrored on the second disk. • Data written to one disk is automatically written to the second.

  9. RAID 1 RAID 1. 2nd disk is exact copy of first. • Data deleted from the first disk is automatically deleted from the second. • When one disk fails, the other continues operating without loss of data. With hot swappable drives, you could then replace the failed disk and the RAID 1 volume would automatically recreate the mirror. • Expensive because it requires twice the physical disk storage space. A 1000 GB RAID 1 volume made from 200 GB disks would require 10 disks.

  10. RAID 5 • Disk Striping with Parity • Minimum of 3 disks • Parity information is shared across all disks. In the event one disk fails, data can be recovered to a new disk using the parity information stored on the other disks on the set. • Faster than RAID 1 as data is read and written from multiple disks at the same time. • Slower than RAID 0 as parity information must be generated and written. • Requires one extra physical disk. A 1000 GB RAID 5 volume made out of 200 GB disks would require 6 disks. Parity Data

  11. 13.3.3 Using Multiple Disks • If we have n disks, read/write performance will increase by a factor of n. • Striping – distributing a relation across multiple disks following this pattern: • Data on disk R1: R1, R1+n, R1+2n,… • Data on disk R2: R2, R2+n, R2+2n,… … • Data on disk Rn: Rn, Rn+n, Rn+2n, … • Ex 13.5: We request 1024 blocks with n = 4. • 6.46ms + (8.33ms * (16/4)) = 39.8ms (1 average seek) (time per rotation) (# rotations)

  12. 13.3.4 Mirroring Disks • Mirroring Disks – having 2 or more disks hold identical copied of data. • Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks. • Benefit 2: If we have n disks, read performance increases by a factor of n. • Performance increases further by having the controller select the disk which has its head closest to desired data block for each read.

  13. 13.3.5 Disk Scheduling and the Elevator Problem • Disk controller will run this algorithm to select which of several requests to process first. • Pseudo code: • requests[] // array of all non-processed data requests • upon receiving new data request: • requests[].add(new request) • while(requests[] is not empty) • move head to next location • if(head location is at data in requests[]) • retrieve data • remove data from requests[] • if(head reaches end) • reverse head direction

  14. 13.3.5 Disk Scheduling and the Elevator Problem (con’t) Events: Head starting point Request data at 8000 Request data at 24000 Request data at 56000 Get data at 8000 Request data at 16000 Get data at 24000 Request data at 64000 Get data at 56000 Request Data at 40000 Get data at 64000 Get data at 40000 Get data at 16000 64000 56000 48000 40000 32000 24000 16000 8000

  15. 13.3.5 Disk Scheduling and the Elevator Problem (con’t) Elevator Algorithm FIFO Algorithm

  16. 13.3.6 Prefetching and Large-Scale Buffering • If at the application level, we can predict the order blocks will be requested, we can load them into main memory before they are needed.

  17. Questions

More Related