290 likes | 402 Views
First magnetic disks, the IBM 305 RAMAC (2 units shown) introduced in 1956. One platter shown top right. A RAMAC stored 5 million characters on 50 24-inch diameter platters. Two access arms moved up and down to select a disk, and in/out to select a track.
E N D
First magnetic disks, the IBM 305 RAMAC (2 units shown) introduced in 1956. One platter shown top right. A RAMAC stored 5 million characters on 50 24-inch diameter platters. Two access arms moved up and down to select a disk, and in/out to select a track. Right: a variety of disk drives: 8”, 5.25”, 3.5”, 1.8” and 1”.
Storage Anselmo Lastra
Outline • Magnetic Disks • RAID • Advanced Dependability/Reliability/Availability • I/O Benchmarks, Performance and Dependability • Conclusion
Disk Figure of Merit: Areal Density • Bits recorded along a track • Metric is Bits Per Inch (BPI) • Number of tracks per surface • Metric is Tracks Per Inch (TPI) • Disk metric is bit density per unit area • Metric is Bits Per Square Inch: Areal Density =BPI x TPI
Historical Perspective • 1956 IBM Ramac — early 1970s Winchester • For mainframe computers, proprietary interfaces • Steady shrink in form factor: 27 in. to 14 in. • Form factor and capacity drives market more than performance • 1970s developments • 5.25 inch floppy disk form factor • Emergence of industry standard disk interfaces • Early 1980s: PCs and first generation workstations • Mid 1980s: Client/server computing • Centralized storage on file server • Disk downsizing: 8 inch to 5.25 • Mass market disk drives become a reality • industry standards: SCSI, IPI, IDE • 5.25 inch to 3.5 inch drives for PCs; End of proprietary interfaces • 1990s: Laptops => 2.5 inch drives • 2000s: 1.8” used in media players (1” microdrive didn’t do as well)
Current Disks • Caches to hold recently accessed blocks • Microprocessor and command buffer to enable reordering of accesses
Future Disk Size and Performance • Continued advance in capacity (60%/yr) and bandwidth (40%/yr) • Slow improvement in seek, rotation (8%/yr) • Time to read whole disk Year Sequentially Randomly (1 sector/seek) 1990 4 minutes 6 hours 2000 12 minutes 1 week(!) 2006 56 minutes 3 weeks (SCSI) 2006 171 minutes 7 weeks (SATA) • Cost has dropped by 100,000 since 1983
Conventional: 4 disk designs 3.5” 5.25” 10” 14” High End Low End Disk Array: 1 disk design 3.5” Arrays of Small Disks? • Katz and Patterson asked in 1987: • Can smaller disks be used to close gap in performance between disks and CPUs?
Advantages of Small Form Factor Disk Drives Low cost/MB High MB/volume High MB/watt Low cost/Actuator Cost and Environmental Efficiencies
Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900 IOs/s ??? Hrs $150K IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K Capacity Volume Power Data Rate I/O Rate MTTF Cost 9X 3X 8X 6X Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?
Array Reliability • Reliability of N disks = Reliability of 1 Disk ÷ N • 50,000 Hours ÷ 70 disks = 700 hours • Disk system MTTF: Drops from 6 years to 1 month! • • Arrays (without redundancy) too unreliable to be useful! Hot spares support reconstruction in parallel with access: very high media availability can be achieved
Redundant Arrays of (Inexpensive) Disks • Files are "striped" across multiple disks • Redundancy yields high data availability • Availability: service still provided to user, even if some components failed • Disks will still fail • Contents reconstructed from data redundantly stored in the array Capacity penalty to store redundant info Bandwidth penalty to update redundant info
RAID0 • Performance only • No redundancy • Stripe data to get higher bandwidth • Latency not improved
Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing recovery group • • Each disk is fully duplicated onto its “mirror” • Very high availability can be achieved • • Bandwidth sacrifice on write: • Logical write = two physical writes • • Reads may be optimized • • Most expensive solution: 100% capacity overhead • (RAID 2 not interesting, so skip)
10010011 11001101 10010011 . . . P 1 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 logical record Striped physical records P contains sum of other disks per stripe mod 2 (“parity”) If disk fails, subtract P from sum of other disks to find missing information Redundant Array of Inexpensive Disks RAID 3: Parity Disk
RAID 3 • Sum computed across recovery group to protect against hard disk failures, stored in P disk • Logically, a single high capacity, high transfer rate disk: good for large transfers • Wider arrays reduce capacity costs, but decreases availability • 33% capacity cost for parity if 3 data disks and 1 parity disk
Inspiration for RAID 4 • RAID 3 relies on parity disk to discover errors on Read • But every sector has an error detection field • To catch errors on read, rely on error detection field vs. the parity disk • Allows independent reads to different disks simultaneously
Stripe Redundant Arrays of Inexpensive Disks RAID 4: High I/O Rate Parity Increasing Logical Disk Address D0 D1 D2 D3 P Insides of 5 disks D7 P D4 D5 D6 D8 D9 D10 P D11 Example: small read D0 & D5, large write D12-D15 D12 D13 P D14 D15 D16 D17 D18 D19 P D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns
D0 D1 D2 D3 P D7 P D4 D5 D6 Inspiration for RAID 5 • RAID 4 works well for small reads • Small writes (write to one disk): • Option 1: read other data disks, create new sum and write to Parity Disk • Option 2: since P has old sum, compare old data to new data, add the difference to P (2 reads, 2 writes) • Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk
Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Increasing Logical Disk Addresses D0 D1 D2 D3 P Independent writes possible because of interleaved parity D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 Example: write to D0, D5 uses disks 0, 1, 3, 4 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns
Downside of Disk Arrays: Cost of Small Writes RAID-5: Small Write Algorithm 1 Logical Write = 2 Physical Reads + 2 Physical Writes D0 D1 D2 D0' D3 P old data new data old parity (1. Read) (2. Read) XOR + + XOR (3. Write) (4. Write) D0' D1 D2 D3 P'
RAID 6: Recovering from 2 failures • Like the standard RAID schemes, it uses redundant space based on parity calculation per stripe • Idea is that operator may make mistake and swap wrong disk, or 2nd disk may fail while replacing 1st • Since it is protecting against a double failure, it adds two check blocks per stripe of data. • If p+1 disks total, p-1 disks have data; assume p=5
Summary of RAID Techniques 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1 • Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead 1 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 • Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk • High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes
HW Failures in Real Systems: Tertiary Disks • A cluster of 20 PCs in seven 7-foot high, 19-inch wide racks with 368 8.4 GB, 7200 RPM, 3.5-inch IBM disks. The PCs are P6-200MHz with 96 MB of DRAM each.
Does Hardware Fail Fast? 4 disks that failed in Tertiary Disk Author says that almost all disk failures began as transient failures. The operator had to decide when to replace.
Internet Archive • Good section in text about Internet Archive • In 2006, over a petabyte of disk (1015 bytes) • Growing at 20 terabytes (1012) per month • Now says ~ 3PB • Each PC was 1GHz Via, 512MB, dissipates 80 W • Each node had 4 500 GB drives • 40 nodes/rack • Petabyte takes 12 racks • PC cost $500, each disk $375, 40-port Ethernet switch $3000
Capricorn PS now • AMD Athlon 64 (x2) • 4 SATA disks (1-4TB) • 92 Watts/node • 40 nodes/rack • So 160 TB/rack • 24 kW/petabyte • So 576 kWh/day, ~17,000/mo • Ave. U.S. house used 920 kWh a month in 2006 • Best housed in KY?
Drives Today • Just ordered simple RAID enclosure (just levels 0, 1) • $63 • Two 1TB SATA drives • $85/ea
Summary • Disks: Areal Density now 30%/yr vs. 100%/yr in 2000s • Components often fail slowly • Real systems: problems in maintenance, operation as well as hardware, software