1 / 39

Data Protection: RAID

Data Protection: RAID. Chapter 3. Chapter Objective. After completing this chapter, you will be able to: Describe what is RAID and the needs it addresses Describe the concepts upon which RAID is built Define and compare RAID levels

rowland
Download Presentation

Data Protection: RAID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Protection: RAID Chapter 3

  2. Chapter Objective After completing this chapter, you will be able to: • Describe what is RAID and the needs it addresses • Describe the concepts upon which RAID is built • Define and compare RAID levels • Recommend the use of the common RAID levels based on performance and availability considerations • Explain factors impacting disk drive performance Data Protection: RAID

  3. Why RAID • Due to mechanical components in a disk drive it offers limited performance • An individual drive has a certain life expectancy • Measured in Mean Time Between Failure (MTBF) • Example • If the MTBF of a drive is 750,000 hours, and there are 1000 drives in the array, then the MTBF of the array becomes 750,000 /1000, or 750 hours • RAID was introduced to mitigate this problem • RAID provides: • Increase capacity • Higher availability • Increased performance Data Protection: RAID

  4. Host RAID Array Components RAID It is a technique that combines multiple disk drives into a logical unit (RAID set) and provides protection, performance, or both. Physical Array Logical Array RAIDController Hard Disks RAID Array Data Protection: RAID

  5. RAID Implementations • Hardware (usually a specialized disk controller card) • Controls all drives attached to it • Array(s) appear to host operating system as a regular disk drive • Provided with administrative software • Software • Runs as part of the operating system • Performance is dependent on CPU workload • Does not support all RAID levels Data Protection: RAID

  6. RAID Techniques • Three key techniques used for RAID are: • Striping • Mirroring • Parity Module 3: Data Protection - RAID

  7. Data Organization: Striping • Within each disk, a predefined no. of contiguously addressable disk blocks are defined as strips. • The set of aligned strips that spans across all the disks within a RAID set is called stripe. • The data is broken down into blocks and each block is written to a separate disk drive. • Strip size describes the no. of blocks in a strip. • Note all strips in the stripe have the same no. of blocks. • Stripe size is a multiple of strip size by the no. of HDD’s in the RAID set. Data Protection: RAID

  8. Strip Stripe=192KB Stripe Data Organization: Striping Strip 1=64KB Strip 2=64KB Strip 3=64KB Stripe 1 Stripe 2 Data Protection: RAID Strips

  9. RAID Technique – Striping Strip RAID Controller Stripe Host Module 3: Data Protection - RAID

  10. Data Organization: Mirroring • Data is stored on two different disks. • When the failed disk is replaced with a new disk, the controller copies the data from the surviving disk of the mirrored pair. • Advantages: a) provides data protection b) improves read performance Disadvantages: a) expensive b) write performance detriorates Data Protection: RAID

  11. RAID Technique – Mirroring Block 0 RAID Controller Block 0 Block 0 Host Module 3: Data Protection - RAID

  12. Data Organization: Parity • Method of protecting striped data from HDD failure without the cost of mirroring • An additional disk is added to the tripe width to hold parity. • Calculation of parity is a function of the RAID controller. • Advantages: reduces cost • Disadvantages: parity is recalculated every time there is a change in data. Data Protection: RAID

  13. RAID Technique – Parity 4 D1 6 D2 RAID Controller 1 D3 7 D4 Host 18 P Actual parity calculation is a bitwise XOR operation Module 3: Data Protection - RAID

  14. Data Recovery in Parity Technique 4 D1 6 D2 RAID Controller ? D3 7 D4 Host P Regeneration of data when Drive D3 fails: 18 4 + 6 + ? + 7 = 18 ? = 18 – 4 – 6 – 7 ? = 1 Module 3: Data Protection - RAID

  15. RAID Levels • 0 Striped array with no fault tolerance • 1 Disk mirroring • Nested RAID (i.e., 1 + 0, 0 + 1, etc.) • 3 Parallel access array with dedicated parity disk • 4 Striped array with independent disks and a dedicated parity disk • 5 Striped array with independent disks and distributed parity • 6 Striped array with independent disks and dual distributed parity Data Protection: RAID

  16. RAIDController Host RAID 0 0 1 5 9 2 6 10 3 7 11 Data Protection: RAID

  17. Block 1 Block 0 Block 1 Block 1 Block 0 Block 0 RAID 1 RAIDController Host Data Protection: RAID

  18. RAID 1 RAIDController RAID 0 Host Block 5 Block 4 Block 2 Block 1 Block 5 Block 4 Block 2 Block 1 Block 0 Block 3 Nested RAID – 0+1 (Striping and Mirroring) Data Protection: RAID

  19. RAID 1 RAIDController RAID 0 Host Block 2 Block 4 Block 2 Block 5 Block 1 Block 5 Block 4 Block 2 Block 1 Block 4 Block 1 Block 5 Nested RAID – 0+1 (Striping and Mirroring) Data Protection: RAID

  20. RAID 0 RAIDController RAID 1 Host Block 2 Block 5 Block 5 Block 2 Block 0 Block 4 Block 3 Block 3 Block 1 Block 1 Block 0 Block 4 Nested RAID – 1+0 (Mirroring and Striping) Data Protection: RAID

  21. RAID 0 RAIDController RAID 1 Host Block 1 Block 1 Block 4 Block 4 Block 5 Block 5 Block 2 Block 2 Block 1 Block 4 Nested RAID – 1+0 (Mirroring and Striping) Data Protection: RAID

  22. RAIDController Host The middle drive fails: RAID Redundancy: Parity 0 4 1 6 5 9 1 ? 3 7 7 11 Parity calculation 4 + 6 + 1 + 7 = 18 0 1 2 3 4 5 6 7 18 4 + 6 + ? + 7 = 18 ? = 18 – 4 – 6 – 7 ? = 1 Data Protection: RAID Parity Disk

  23. RAIDController ParityGenerated Host Block 1 Block 2 Block 3 Block 0 Block P 0 1 2 3 RAID 3 Data Protection: RAID

  24. P 0 1 2 3 P 0 1 2 3 Block 0 Block 0 Block 3 Block 1 Block 2 P 4 5 6 7 Block 5 Block 4 Block 6 Block 7 RAIDController Block 0 Block 0 ParityGenerated Host P 0 1 2 3 RAID 4 Data Protection: RAID

  25. P 0 1 2 3 P 0 1 2 3 Block 0 Block 0 Block 3 Block 1 Block 2 ParityGenerated P 4 5 6 7 P 4 5 6 7 Block 5 Block 6 Block 4 Block 4 Block 7 RAIDController Block 0 Block 4 Block 0 Block 4 P 4 5 6 7 ParityGenerated Host P 0 1 2 3 RAID 5 Data Protection: RAID

  26. RAID 6 – Dual Parity RAID • Two disk failures in a RAID set leads to data unavailability and data loss in single-parity schemes, such as RAID-3, 4, and 5 • Increasing number of drives in an array and increasing drive capacity leads to a higher probability of two disks failing in a RAID set • RAID-6 protects against two disk failures by maintaining two parities • Horizontal parity which is the same as RAID-5 parity • Diagonal parity is calculated by taking diagonal sets of data blocks from the RAID set members • Even-Odd, and Reed-Solomon are two commonly used algorithms for calculating parity in RAID-6 Data Protection: RAID

  27. RAID Comparison Data Protection: RAID

  28. Suitable RAID Levels for Different Applications • RAID 1+0 • Suitable for applications with small, random, and write intensive (writes typically greater than 30%) I/O profile • Example: OLTP, RDBMS – Temp space • RAID 3 • Large, sequential read and write • Example: data backup and multimedia streaming • RAID 5 and 6 • Small, random workload (writes typically less than 30%) • Example: email, RDBMS – Data entry Module 3: Data Protection - RAID

  29. RAID Impacts on Performance RAID Controller Cpnew Cpold C4 old C4 new = - + • In RAID 5, every write (update) to a disk manifests as four I/O operations (2 disk reads and 2 disk writes) • In RAID 6, every write (update) to a disk manifests as six I/O operations (3 disk reads and 3 disk writes) • In RAID 1, every write manifests as two I/O operations (2 disk writes) 2 3 4 1 A1 A2 A3 A4 AP B1 B2 B3 BP B4 C1 C2 CP C3 C4 Module 3: Data Protection - RAID

  30. D4 D2 D1 P0 D3 RAID Impacts on Performance RAID Controller Ep new Ep old E4 old E4 new • Small (less than element size) write on RAID 5 • Ep = E1 + E2 + E3 + E4 (XOR operations) • If parity is valid, then: Ep new = Ep old – E4 old + E4 new (XOR operations) • 2 disk reads and 2 disk writes • Parity Vs Mirroring • Reading, calculating and writing parity segment introduces penalty to every write operation • Parity RAID penalty manifests due to slower cache flushes • Increased load in writes can cause contention and can cause slower read response times = - + 2 XOR Data Protection: RAID

  31. RAID Penalty Exercise • Total IOPS at peak workload is 1200 • Read/Write ratio 2:1 • Calculate IOPS requirement at peak activity for • RAID 1/0 • RAID 5 Data Protection: RAID

  32. Solution: RAID Penalty • For RAID 1/0, the disk load (read + write) = (1200 x 2/3) + (1200 x (1/3) x 2) = 800 + 800 = 1600 IOPS • For RAID 5, the disk load (read + write) =(1200 x 2/3) + (1200 x (1/3) x 4) = 800 + 1600 = 2400 IOPS Module 3: Data Protection - RAID

  33. Hot Spares • It refers to a spare HDD in RAID array that temporarily replaces a failed HDD. • If parity RAID is used, then the data is rebuilt onto the hot spare from the parity and the data on the surviving HDDs in the RAID set. • If mirroring is used, then the data from the surviving mirror is used to copy the data. Data Protection: RAID

  34. Hot Spares • When failed HDD is replaced with new HDD, one of foll. takes place: • The hot spare replaces the new HDD permanently. • When a new HDD is added, data from the hot spare is copied to it. • A hot spare can be configured as automatic or user-initiated. Data Protection: RAID

  35. Hot Spare Failed disk RAIDController Replace failed disk Hot spare Module 3: Data Protection - RAID

  36. Chapter Summary Key points covered in this chapter: • What RAID is and the needs it addresses • The concepts upon which RAID is built • Some commonly implemented RAID levels Data Protection: RAID

  37. Exercise 1: RAID • A company is planning to reconfigure storage for their accounting application for high availability • Current configuration and challenges • Application performs 15% random writes and 85% random reads • Currently deployed with five disk RAID 0 configuration • Each disk has an advertised formatted capacity of 200 GB • Total size of accounting application’s data is 730 GB which is unlikely to change over 6 months • Approaching end of financial year, buying even one disk is not possible • Task • Recommend a RAID level that the company can use to restructure their environment fulfilling their needs • Justify your choice based on cost, performance, and availability Module 3: Data Protection - RAID

  38. Exercise 2: RAID • A company (same as discussed in exercise 1) is now planning to reconfigure storage for their database application for HA • Current configuration and challenges • The application performs 40% writes and 60% reads • Currently deployed on six disk RAID 0 configuration with advertised capacity of each disk being 200 GB • Size of the database is 900 GB and amount of data is likely to change by 30% over the next 6 months • It is a new financial year and the company has an increased budget • Task • Recommend a suitable RAID level to fulfill company’s needs • Estimate the cost of the new solution (200GB disk costs $1000) • Justify your choice based on cost, performance, and availability Module 3: Data Protection - RAID

  39. Check Your Knowledge  • What is a RAID array? • What benefits do RAID arrays provide? • What methods can be used to provide higher data availability in a RAID array? • What is the primary difference between RAID 3 and RAID 5? • What is advantage of using RAID 6? • What is a hot spare? Data Protection: RAID

More Related