200 likes | 329 Views
Disk Failures. Xiaqing He ID: 204 Dr. Lin. Content. 1)Focus on : “How to recover from disk crashes” common term RAID “redundancy array of independent disks” 2)Several schemes to recover from disk crashes: Mirroring—RAID level 1; Parity checks--RAID 4;
E N D
Disk Failures Xiaqing He ID: 204 Dr. Lin
Content 1)Focus on : “How to recover from disk crashes” common term RAID “redundancy array of independent disks” 2)Several schemes to recover from disk crashes: • Mirroring—RAID level 1; • Parity checks--RAID 4; • Improvement--RAID 5; • RAID 6;
1) Mirroring • The simplest scheme to recovery from Disk Crashes • How does Mirror work? -- making two or more copied of the data on different disks • Benefit: -- save data in case of one disk will fail; -- divide data on several disks and let access to several blocks at once
1) Mirroring (con’t) • For mirroring, when the data can be lost? -- the only way data can be lost if there is a second (mirror/redundant) disk crash while the first (data) disk crash is being repaired. • Possibility: Suppose: • One disk: mean time to failure = 10 years; • One of the two disk: average of mean time to failure = 5 years; • The process of replacing the failed disk= 3 hours=1/2920 year; So: • the possibility of the mirror disk will fail=1/10 * 1/2,920 =1/29,200; • The possibility of data loss by mirroring: 1/5 * 1/29,200 = 1/146,000
2)Parity Blocks • why changes? -- disadvantages of Mirroring: uses so many redundant disks • What’s new? -- RAID level 4: uses only one redundant disk • How this one redundant disk works? -- modulo-2 sum; -- the jth bit of the redundant disk is the modulo-2 sum of the jth bits of all the data disks. • Example
2)Parity Blocks(con’t)___Example Data disks: • Disk1: 11110000 • Disk2: 10101010 • Disk3: 00111000 Redundant disk: • Disk4: 01100010
2)RAID 4 (con’t) • Reading -- Similar with reading blocks from any disk; • Writing 1)change the data disk; 2)change the corresponding block of the redundant disk; • Why? -- hold the parity checks for the corresponding blocks of all the data disks
2)RAID 4 (con’t) _ writing For a total N data disks: 1) naïve way: • read N data disks and compute the modulo-2 sum of the corresponding blocks; • rewrite the redundant disk according to modulo-2 sum of the data disks; 2) better way: • Take modulo-2 sum of the old and new version of the data block which was rewritten; • Change the position of the redundant disk which was 1’s in the modulo-2 sum;
2)RAID 4 (con’t) _ writing_Example • Data disks: • Disk1: 11110000 • Disk2: 10101010 01100110 • Disk3: 00111000 • to do: • Modulo-2 sum of the old and new version of disk 2: 11001100 • So, we need to change the positions 1,2,5,6 of the redundant disk. • Redundant disk: • Disk4: 01100010 10101110
2)RAID 4 (con’t) _failure recovery • Redundant disk crash: -- swap a new one and recomputed data from all the data disks; • One of Data disks crash: -- swap a new one; -- recomputed data from the other disks including data disks and redundant disk; • How to recomputed? (same rule, that’s why there will be some improvement) -- take modulo-2 sum of all the corresponding bits of all the other disks
3) An Improvement: RAID 5 • Why need a improvement? -- Shortcoming of RAID level 4: suffers from a bottleneck defect (when updating data disk need to read and write the redundant disk); • Principle of RAID level 5 (RAID 5): -- treat each disk as the redundant disk for some of the blocks; • Why it is feasible? The rule of failure recovery for redundant disk and data disk is the same: “take modulo-2 sum of all the corresponding bits of all the other disks” So, there is no need to retreat one as redundant disk and others as data disks
3) RAID 5 (con’t) • How to recognize which blocks of each disk treat this disk as redundant disk? -- if there are n+1 disks which were labeled from 0 to N, then we can treat the ith cylinder of disk J as redundant if J is the remainder when I is divided by n+1; • Example;
3) RAID 5 (con’t)_example N=3; • The first disk, labeled as 0 : 4,8,12…; • The second disk, labeled as 1 : 1,5,9…; • The third disk, labeled as 2 : 2,6,10…; • ………. Suppose all the 4 disks are equally likely to be written, for one of the 4 disks, the possibility of being written: • 1/4 + 3 /4 * 1/3 =1/2 • If N=m => 1/m +(m-1)/m * 1/(m-1) = 2/m
4) Coping with multiple disk crashes • RAID 6 – deal with any number of disk crashes if using enough redundant disks • Example a system of seven disks ( four data disks_numer 1-4 and 3 redundant disks_ number 5-7); • How to set up this 3*7 matrix ? (why is 3? – there are 3 redundant disks) 1)every column values three 1’s and 0’s except for all three 0’s; 2) column of the redundant disk has single 1’s; 3) column of the data disk has at least two 1’s;
4) Coping with multiple disk crashes (con’t) • Reading: • read form the data disks and ignore the redundant disk • Writing: • Change the data disk • change the corresponding bits of all the redundant disks
4) Coping with multiple disk crashes (con’t) • In those system which has 4 data disks and 3 redundant disk, how they can correct up to 2 disk crashes? • Suppose disk a and b failed: • find some row r (in 3*7 matrix)in which the column for a and b are different (suppose a is 0’s and b is 1’s); • Compute the correct b by taking modulo-2 sum of the corresponding bits from all the other disks other than b which have 1’s in row r; • After getting the correct b, Compute the correct a with all other disks available; • Example
4) Coping with multiple disk crashes (con’t)_example 3*7 matrix data disk redundant disk disk number 1 2 3 4 5 6 7
4) Coping with multiple disk crashes (con’t)_example First block of all the disks disk contents 1) 11110000 2) 10101010 3) 00111000 4) 01000001 5) 01100010 6) 00011011 7) 10001001
4) Coping with multiple disk crashes (con’t)_example Two disks crashes; disk contents 1) 11110000 2) ????????? 3) 00111000 4) 01000001 5) ????????? 6) 00011011 7) 10001001
4) Coping with multiple disk crashes (con’t)_example In that 3*7 matrix, find in row 2, disk 2 and 5 have different value and disk 2’s value is 1 and 5’s value is 0. so: compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,4,6; then compute thefirst block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,2,3; 1) 11110000 2) ????????? => 00001111 3) 00111000 4) 01000001 5) ????????? => 01100010 6) 00011011 7) 10001001