200 likes | 473 Views
The 5 Minute Rule. Jim Gray Microsoft Research Gray@Microsoft.com http://www.Research.Microsoft.com/~Gray/talks. Kilo 10 3 Mega 10 6 Giga 10 9 Tera 10 12 today, we are here Peta 10 15 Exa 10 18. Storage Hierarchy (9 levels). Cache 1, 2 Main (1, 2, 3 if nUMA). Disk (1 (cached), 2)
E N D
The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.com http://www.Research.Microsoft.com/~Gray/talks Kilo 103 Mega 106 Giga 109 Tera 1012 today, we are here Peta 1015 Exa 1018
Storage Hierarchy (9 levels) • Cache 1, 2 • Main (1, 2, 3 if nUMA). • Disk (1 (cached), 2) • Tape (1 (mounted), 2)
Meta-Message: Technology Ratios Are Important • If everything gets faster & cheaper at the same rate THEN nothing really changes. • Things getting MUCH BETTER: • communication speed & cost 1,000x • processor speed & cost 100x • storage size & cost 100x • Things staying about the same • speed of light (more or less constant) • people (10x more expensive) • storage speed (only 10x better)
Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs 15 4 10 10 12 2 10 10 9 0 10 10 6 -2 10 10 3 -4 10 10 Size vs Speed Price vs Speed Cache Nearline Tape Offline Main Tape Disc Secondary Online Online $/MB Secondary Tape Tape Typical System (bytes) Disc Main Offline Nearline Tape Tape Cache -9 -6 -3 0 3 -9 -6 -3 0 3 10 10 10 10 10 10 10 10 10 10 Access Time (seconds) Access Time (seconds)
Storage Ratios Changed • 10x better access time • 10x more bandwidth • 4,000x lower media price • DRAM/DISK 100:1 to 10:10 to 50:1
Thesis: Performance =Storage Accesses not Instructions Executed • In the “old days” we counted instructions and IO’s • Now we count memory references • Processors wait most of the time
The Pico Processor 1 M SPECmarks 106 clocks/ fault to bulk ram Event-horizon on chip. VM reincarnated Multi-program cache Terror Bytes!
Storage Latency: How Far Away is the Data? Andromeda 9 10 Tape /Optical 2,000 Years Robot 6 Pluto Disk 2 Years 10 1.5 hr Sacramento 100 Memory This Campus 10 10 min On Board Cache 2 On Chip Cache This Room 1 Registers My Head 1 min
The Five Minute Rule • Trade DRAM for Disk Accesses • Cost of an access (DriveCost / Access_per_second) • Cost of a DRAM page ( $/MB / pages_per_MB) • Break even has two terms: • Technology term and an Economic term • Grew page size to compensate for changing ratios. • Still at 5 minute for random, 1 minute sequential
Standard Storage Metrics • Capacity: • RAM: MB and $/MB: today at 10MB & 100$/MB • Disk: GB and $/GB: today at 10 GB and 200$/GB • Tape: TB and $/TB: today at .1TB and 25k$/TB (nearline) • Access time (latency) • RAM: 100 ns • Disk: 10 ms • Tape: 30 second pick, 30 second position • Transfer rate • RAM: 1 GB/s • Disk: 5 MB/s - - - Arrays can go to 1GB/s • Tape: 5 MB/s - - - striping is problematic
New Storage Metrics: Kaps, Maps, SCAN? • Kaps: How many kilobyte objects served per second • The file server, transaction processing metric • This is the OLD metric. • Maps: How many megabyte objects served per second • The Multi-Media metric • SCAN: How long to scan all the data • the data mining and utility metric • And • Kaps/$, Maps/$, TBscan/$
For the Record (good 1998 devices packaged in systemhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf) X 14
How To Get Lots of Maps, SCANs • parallelism: use many little devices in parallel • Beware of the media myth • Beware of the access time myth At 10 MB/s: 1.2 days to scan 1,000 x parallel: 100 seconds SCAN. Parallelism: divide a big problem into many smaller ones to be solved in parallel.
The Disk Farm On a Card The 100GB disc card An array of discs Can be used as 100 discs 1 striped disc 10 Fault Tolerant discs ....etc LOTS of accesses/second bandwidth 14" • Life is cheap, its the accessories that cost ya. • Processors are cheap, it’s the peripherals that cost ya • (a 10k$ disc card).
Tape Farms for Tertiary StorageNot Mainframe Silos 100 robots 1M$ 50TB 50$/GB 3K Maps 10K$ robot 14 tapes 27 hr Scan 500 GB 5 MB/s 20$/GB Scan in 27 hours. many independent tape robots (like a disc farm) 30 Maps
The Metrics: Disk and Tape Farms Win Data Motel: Data checks in, but it never checks out GB/K$ 1 , 000 , 000 Kaps 100 , 000 Maps SCANS/Day 10 , 000 1 , 000 100 10 1 0.1 0.01 1000 x D i sc Farm 100x DLT Tape Farm STC Tape Robot 6,000 tapes, 8 readers
Tape & Optical: Beware of the Media Myth Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB(2x cheaper than disc) Tape is cheap: 30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).
Tape & Optical Reality: Media is 10% of System Cost • Tape needs a robot (10 k$ ... 3 m$ ) • 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB • (1x…10x cheaper than disc) • Optical needs a robot (100 k$ ) • 100 platters = 200GB ( TODAY ) => 400 $/GB • ( more expensive than mag disc ) • Robots have poor access times • Not good for Library of Congress (25TB) • Data motel: data checks in but it never checks out!
The Access Time Myth The Myth: seek or pick time dominates The reality: (1) Queuing dominates (2) Transfer dominates BLOBs (3) Disk seeks often short Implication: many cheap servers better than one fast expensive server • shorter queues • parallel transfer • lower cost/access and cost/byte This is now obvious for disk arrays This will be obvious for tape arrays