Lecture 24 Disk IO and RAID

CS 15-447: Computer Architecture Lecture 24Disk IO and RAID November 12, 2007 Nael Abu-Ghazaleh naelag@cmu.edu http://www.qatar.cmu.edu/~msakr/15447-f08

Interfacing Processor with peripherals Processor L1 cache Instrs. L1 cache data L2 Cache Front side bus, aka system bus memory bus main memory bus interface I/O bridge To I/O

Another view

Disk Access • Seek: position head over the proper track(5 to 15 ms. avg.) • Rotate: wait for desired sector(.5 / RPM). RPM 5400—15,000 currently • Transfer: get the data(30-100Mbytes/sec)

Manufacturing Advantages of Disk Arrays Disk Product Families Conventional: 4 disk designs 3.5” 5.25” 10” 14” High End Low End Disk Array: 1 disk design 3.5”

RAID: Redundant Array of Inexpensive Disks • RAID 0: Striping (misnomer: non-redundant) • RAID 1: Mirroring • RAID 2: Striping + Error Correction • RAID 3: Bit striping + Parity Disk • RAID 4: Block striping + Parity Disk • RAID 5: Block striping + Distributed Parity • RAID 6: multiple parity checks

Non-Redundant Array • Striped: write sequential blocks across disk array • High performance • Poor reliability:MTTFArray = MTTFDisk / NMTTFDisk = 50,000 hours (6 years)N = 70 DisksMTTFArray= 700 hours (1 month) Odd Blocks Even Blocks

Redundant Arrays of Disks • Files are "striped" across multiple spindles • Redundancy yields high data availability • When disks fail, contents are reconstructed from data redundantly stored in the array • High reliability comes at a cost: • Reduced storage capacity • Lower performance

RAID 1: Mirroring • Each disk is fully duplicated onto its “shadow”  very high availability • Bandwidth sacrifice on writes:Logical write = two physical writes • Reads may be optimized • Most expensive solution: 100% capacity overhead Used in high I/O rate , high availability environments

RAID 3: bit striping + parity • A parity bit for every bit in the striped data • Parity is relatively easy to compute • How does it perform for small reads/writes?

Redundant Arrays of Disks RAID 3: Parity Disk 10010011 11001101 10010011 . . . P logical record 1 0 0 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 Striped physical records • Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time • Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk Targeted for high bandwidth applications: Scientific, Image Processing

RAID 4 (Block interleaved parity)

Redundant Arrays of Disks RAID 5+: High I/O Rate Parity Increasing Logical Disk Addresses D0 D1 D2 D3 P A logical write becomes four physical I/Os Independent writes possible because of interleaved parity Reed-Solomon Codes ("Q") for protection during reconstruction D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 Stripe P D16 D17 D18 D19 Targeted for mixed applications Stripe Unit D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns

Nested RAID levels • RAID 01 and 10 combine mirroring and striping • Combine high performance (striping) and reliability (mirroring) • Get reliability without having to compute parities: higher performance and less complex controller • RAID 05 and 50 (also called 53)

Operating System can help (1) Reducing access time • Disk defragmentation: why does that work? • Disk scheduling: operating system can reorder requests • How does it work? Reduce seek time • Example: Mean seek distance first, Elevator algorithm, Typewriter algorithm • Lets do an example • Log structured file systems

Log structured file systems • Idea: most reads to disk are serviced from cache – locality! • But what about writes?  they have to go to disk; if system crashes, we the file system is compromised • How can we make updates perform better: • Save them in a log (sequentially) instead of their original location; why does that help? • Tricky to manage

Operating System can help (2) Reliability • RAIDs are reliable to disk failures, not CPU failures/software bugs • If the cpu writes corrupt data to all redundant disks, what can we do? • Backups • Reliability in the operating system

How are files allocated on disk? • Index block, has pointers to the other blocks in the file • Alternatives: linked allocation • Data and meta data both stored on disk • What do we do for bigger files?

Unix Inodes

Disk reliability • Any update to disk, changes both data and meta data • requires several writes • Operating system may reorder them as we saw • What happens if there is a crash? • Lets look at examples • Solution: journaling file system • Update journal before updating filesystem

Flash Memory • Emerging technology for non-volatile storage – competitor to hard disks, especially for embedded market • Can be used as cache for the disk (much larger than RAM disks for the same price, and persistent) • Floating gate transistors: semi-conductor technology (like microprocessors and memory) – we know how to build them big (or small!) and cheap • Faster, lower power than disk drives • ...but still more expensive, and has some limitations • Two types of flash memory: NAND and NOR

NOR Flash • NOR accessed like regular memory and has faster read time • Used for executables/firmware that dont need to change often (PDAs, cellphones, etc.. code) – Can be executed in place • bad write/erase performance (2 seconds to erase a block!) • bad wear properties (100,000 writes average lifetime)

NAND Flash • Accessed like a block device (like a disk drive) • Higher density, lower cost • Faster write/erase time; longer write life expectancy • Well suited for cameras, mp3 players, USB drives... • Less reliable than NOR (requires error correction codes)

Different properties from Disks • Flash memory has quite different properties from disks – Emphasis on seek time gone • Needs to erase a segment before writing (small writes are expensive!) • Slow...(especially NOR erase/write and NAND random access reads) • Must be done in large segments (10s of KBytes) • Can only be rewritten a limited number of times

Summary of Flash circa. 2006

Lecture 24 Disk IO and RAID