1 / 28

Data Protection: RAID

Section 1 : Storage System. Data Protection: RAID. Chapter 3. Why RAID. Performance limitation of disk drive An individual drive has a certain life expectancy Measured in MTBF (Mean Time Between Failure)

arnold
Download Presentation

Data Protection: RAID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 1 : Storage System Data Protection: RAID Chapter 3

  2. Why RAID • Performance limitation of disk drive • An individual drive has a certain life expectancy • Measured in MTBF (Mean Time Between Failure) • The more the number of HDDs in a storage array, the larger the probability for disk failure. For example: • If the MTBF of a drive is 750,000 hours, and there are 100 drives in the array, then the MTBF of the array becomes 750,000 / 100, or 7,500 hours • RAID was introduced to mitigate this problem • RAID provides: • Increase capacity • Higher availability • Increased performance

  3. Chapter objectives After completing this chapter, you will be able to: • Describe what is RAID and the needs it addresses • Describe the concepts upon which RAID is built • Define and compare RAID levels • Recommend the use of the common RAID levels based on performance and availability considerations • Explain factors impacting disk drive performance

  4. Host RAID Array Components Physical Array Logical Array RAIDController Hard Disks RAID Array

  5. RAID-redundant array of inexpensive disks • It is an enclosure that contains a number of HDDs • HDD inside RAID contained in smaller sub –enclosure called physical array=hold a fixed num of HDD • A subset of disks within a RAID array can be grouped to form logical associations called logical array.

  6. RAID Implementations • Hardware (usually a specialized disk controller card) • Controls all drives attached to it • Array(s) appear to host operating system as a regular disk drive • Provided with administrative software • Software • Runs as part of the operating system • Performance is dependent on CPU workload • Does not support all RAID levels

  7. RAID levels • Are defined on the basis of striping, mirroring and parity • These techniques determine the data availability and performance characteristics of an array.

  8. RAID Levels • 0 Striped array with no fault tolerance • 1 Disk mirroring • Nested RAID (i.e., 1 + 0, 0 + 1, etc.) • 3 Parallel access array with dedicated parity disk • 4 Striped array with independent disks and a dedicated parity disk • 5 Striped array with independent disks and distributed parity • 6 Striped array with independent disks and dual distributed parity

  9. striping • The process of dividing a body of data into blocks and spreading the data blocks across several partitions on several HD • A RAID is a group of disks • Within each disk, a predefined num of contiguous addressable disk blocks are defined as strips • The set of aligned strips that spans across all the disks with the RAID set is called stripe

  10. Strip Stripe Stripe Data Organization: Striping Strip 1 Strip 2 Strip 3 Stripe 1 Stripe 2 Strips

  11. mirroring • Techniques whereby data is stored on two different HDD-have two copies of data. • Involved duplication of data • Expensive • Improved read performance • But not the write performance A A B B

  12. Parity • Method used to protect striped data from HDD failures without the cost of mirroring. • Additional HDD is added to the stripe width to hold parity. • Mathematical construct that allow re-creation of the missing data. • A redundancy check that ensure a full protection of data without maintaining a full set of duplicate data. • The even number =0 • The odd number = 1

  13. Parity-example 0 1 0 1 0 0 1 1 1 1

  14. RAID 0 • Data is distributed across the HDDs in the RAID set. • Allows multiple data to be read or written simultaneously, and therefore improves performance. • Does not provide data protection and availability in the event of disk failures.

  15. RAIDController Host RAID 0 0 1 5 9 2 6 10 3 7 11

  16. RAID 1 • Data is stored on two different HDDs, yielding two copies of the same data. • Provides availability. • In the event of HDD failure, access to data is still available from the surviving HDD. • When the failed disk is replaced with a new one, data is automatically copied from the surviving disk to the new disk. • Done automatically by RAID the controller. • Disadvantage: The amount of storage capacity is twice the amount of data stored. • Mirroring is NOT the same as doing backup!

  17. Block 1 Block 0 Block 1 Block 1 Block 0 Block 0 RAID 1 RAIDController Host

  18. Nested RAID • Combines the performance benefits of RAID 0 with the redundancy benefit of RAID 1. • RAID 0+1 – Mirrored Stripe • Data is striped across HDDs, then the entire stripe is mirrored. • If one drive fails, the entire stripe is faulted. • Rebuild operation requires data to be copied from each disk in the healthy stripe, causing increased load on the surviving disks. • RAID 1+0 – Striped Mirror • Data is first mirrored, and then both copies are striped across multiple HDDs. • When a drive fails, data is still accessible from its mirror. • Rebuild operation only requires data to be copied from the surviving disk into the replacement disk.

  19. RAID 3 and RAID 4 • Stripes data for high performance and uses parity for improved fault tolerance. • One drive is dedicated for parity information. • If a drive files, data can be reconstructed using data in the parity drive. • For RAID 3, data read / write is done across the entire stripe. • Provide good bandwidth for large sequential data access such as video streaming. • For RAID 4, data read/write can be independently on single disk.

  20. RAID 5 and RAID 6 • RAID 5 is similar to RAID 4, except that the parity is distributed across all disks instead of stored on a dedicated disk. • This overcomes the write bottleneck on the parity disk. • RAID 6 is similar to RAID 5, except that it includes a second parity element to allow survival in the event of two disk failures. • The probability for this to happen increases and the number of drives in the array increases. • Calculates both horizontal parity (as in RAID 5) and diagonal parity. • Has more write penalty than in RAID 5. • Rebuild operation may take longer than on RAID 5.

  21. RAID Comparison

  22. RAIDController Hot Spares

  23. Hot spare • Refer to a spare HDD in a RAID array that temporary replaces a failed HDD of a RAID set. • One of the following method of data recovery is performed depending on the RAID implementation: • If parity RAID is used, then the data is rebuild on the HS from the parity and data on the surviving HDDs in the RAID set. • If mirroring is used then the data from the surviving mirror is used to copy the data. • When the failed HDD is replaced with a new HDD, one of the following takes place; • The HS replace the new HDD permanently – no longer HS, need to configured new HS • When a new HDD is added to the system, data from the HS is copied to it. • The HS returns to idle state

  24. RAID penalty • I/O operations =R/W • Read operation =no penalty since we can read many times • Write operation= have penalty – change the data , every write operation translates into more I/O overhead for the disks=known as write penalty. • Example 1: in RAID 1, write has to be performed to disks-data has to be written twice –once on each disks (RAID 1 has 2 disks-mirror set} • Therefore the write penalty =2 • Example 2: RAID 5 , have write penalty =4 since data has to be performed 4 times to disks- read existing data, read parity, write new data, write new parity!

  25. Continue.. • RAID level 6 has 6 write penalty…why? • Why we need to know this? • To calculate the disk load in different types of RAID • Example 1: • we have an application that generates 5,200 input output per second (IOPS), with 60% of them being read. The disk load in RAID 5 is calculated as below: = 0.6 x 5200 + 4 X (0.4 x 5200) = 3120 + 4 X 2080 =3120+8320 = 11,440 IOPS

  26. Continue… • Example 2: the disk load in RAID 1 is calculated below: = 0.6 X 5200 + 2 x( 0.4 x 5200) = 3120 + 2 x 2080 = 3120+4160 = 7280 IOPS From these calculation, we can determine the number of disk required for an applications. Example, an HDD with a specification of a maximum 180 IOPS for the application needs to be used, the number of disks required to meet the workload for the RAID configuration will be: RAID 5: 11,440/180 =64 disks RAID 1 : 7280/180 = 42 disk ( approx to the nearest even number).

  27. Chapter Summary Key points covered in this chapter: • What RAID is and the needs it addresses • The concepts upon which RAID is built • Some commonly implemented RAID levels

  28. #1 IT company For more information visit http://education.EMC.com

More Related