760 likes | 941 Views
Computers are Free, Now What? Premise: You're a Fortune 1,000 CIO I’m a DB+OS guy selling CyberBricks What can I say in an hour that you do not know? How can I help you plan for CyberBricks?. Jim Gray Microsoft Research Gray@Microsoft.com http://research.Microsoft.com/~Gray 415 778 8222.
E N D
Computers are Free, Now What?Premise:You're a Fortune 1,000 CIOI’m a DB+OS guy selling CyberBricksWhat can I say in an hour that you do not know?How can I help you plan for CyberBricks? Jim Gray Microsoft Research Gray@Microsoft.com http://research.Microsoft.com/~Gray 415 778 8222
Outline • Why cost per transaction dropped 100,000x in 10 years. • How does that change things? • What next (technology trends) • Clusters of Hardware and Software CyberBricks
Systems 30 Years Ago • MegaBuck per Mega Instruction Per Second (mips) • MegaBuck per MagaByte • Sys Admin & Data Admin per MegaBuck
Disks of 30 Years Ago • 10 MB • Failed every few weeks
1988: IBM DB2 + CICS Mainframe65 tps • IBM 4391 • Simulated network of 800 clients • 2m$ computer • Staff of 6 to do benchmark 2 x 3725 network controllers Refrigerator-sized CPU 16 GB disk farm 4 x 8 x .5GB
1987: Tandem Mini @ 256 tps • 14 M$ computer (Tandem) • A dozen people (1.8M$/y) • False floor, 2 rooms of machines Admin expert 32 node processor array Performance expert Hardware experts Simulate 25,600 clients Network expert Auditor Manager 40 GB disk array (80 drives) DB expert OS expert
1997: 9 years later1 Person and 1 box = 1250 tps • 1 Breadbox ~ 5x 1987 machine room • 23 GB is hand-held • One person does all the work • Cost/tps is 100,000x less5 micro dollars per transaction 4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk Hardware expert OS expert Net expert DB expert App expert 3 x7 x 4GB disk arrays
Cost Per Transaction • Industry uses $/tps (or $/tpm): 5 year cost of hardware and software to get 1 tps. • There are about 1 Million seconds in 3 years • So, if $/tps is 1$, $/t is 1 micro-dollar. • 1988: mini: 50K$/tps mainframe: 150k$/tps • 5 cents to 15 cents per transaction • 1998: micro: 30$/tpmc = 50¢/tpsC • 5 micro-dollars per transactionnote it is actually 6x less than this, tpcC is 6x tpcA
UNIX vs WindowsNT • Solaris on SPARC range 11,559 tpmC @ 57$/tpmc (Sybase) to 51,871 tpmC @ 135 tpmC (Oracle) • SQL on NT/Compaq range 11,748 tpmC @ 27$/tmpC to 18,129 tpmC @ 27 $/tpmC • NT price per transaction is 2x to 4x less, peak performance per node is 3x less. • Markup is in Oracle and SPARC (disk and DRAM prices OK.) • Note:current NT prices are 27$/tpmC not 33 $/tpmC so 23% lower than shown • UNIX is 5x less than MVS according to David Matthews, “Large Server TCO: The UNIX advantage”, Unix Review Feb 1998 Reseller Supplement, pp 3-11
mainframe mini price micro time What Happened?Where did the 100,000x come from? • Moore’s law: 100X (at most) • Software improvements: 10X (at most) • Commodity Pricing: 100X (at least) • Total 100,000X • 100x from commodity • (DBMS was 100K$ to start: now 1k$ to start • IBM 390 MIPS is 7.5K$ today • Intel MIPS is 10$ today • Commodity disk is 50$/GB vs 1,500$/GB • ...
Outline • Why cost per transaction has dropped 100,000x in 10 years. • How does that change things? • What next (technology trends) • Clusters of Hardware and Software CyberBricks
What does 1 μ$/t Mean? • Human Attention is the precious resource. • Content is the precious resource • Impressions (eyeballs) sell for10,000 μ $ to 100,000 μ $ • All costs (and value) is in content and admin. • Aside, this month, the TerraServer got 400M hits, 40 M impressionsa 2M$/mo asset (for satellite photos.) • That’s why everyone is hot on portals.
Administration Costs • Vendor Rule of thumb (1970s mainframe) • one systems programmer per MIPS • one data admin per 10 GB • DataCenter Rule of thumb: • Hardware & Facilities is 40% • Labor is 60% • => 100 sys pgmrs and 1 data admin per laptop! • 1995 Federal study of their data centers • 1 to 3 MIPS per admin! (http://research.microsoft.com/~gray/NC_Servers.doc) • Thin client: • move admin to server • claim: save admin costs • reality: move admin costs to expensive fixed staff • Time will tell.
Content Costs • For most web sites • Most staff are doing content • Admin is small fraction of content • RULE OF THUMB: • Hardware/software/facilities/admin is 10% of content • Content is 90% of cost • This seems to apply to • microsoft.com, msn, WebTV, HotMail, Inktomi • MAIN CONCLUSION • Hardware, software, admin is in micro$/t range • Unix and mainframes are 2x or 10x more micro$ • Who cares? Cost is in content • Look for content creation/management tools
Legacy Latency:a personal tale • 1970s helped company X covert to IMS/Fast Path • 1980s helped company X experiment with Tandem mini-computers • 1990s visit and ask: • Why are you still buying those mainframes? • Answers: 1. They are up all the time (99.99% up). 2. 25 years ago ROI was 18 months, now it is 1 week. 3.A rewrite would cost more than it would ever save. 4. My career would not survive a rewrite. 5. The devil you know is better than the devil you don’t.
Put Anther Way • You are ATT or the airlines industry or... You do 300 M transactions/day • The capital cost of these transactions is • 300 $/day on NT • 1,000 $/day on Solaris • 10,000 $/day on MVS • Who cares? Revenue and costs are 200,000,000 $/daySo, transaction cost is .01% or .0001%. • But, if productivity is higher on Solaris or NT…Or if tools exist on them, then….Or if cost of 2nd or 3rd environment is huge (staff), then... • New apps should not go on MVS! • Investing in SNA? Investing in IMS? Investing in TPF?..
1985 1995 2005 What Happens Next ? • Last 10 years: 100,000x improvement • Next 10 years: ???? • Today: text and image servers are free 25 m$/hit => advertising pays for them • Future:video, audio, … servers are free“You ain’t seen nothing yet!” performance
And So... Point-to-Point Broadcast • Traditional transaction processing is a zero-billion dollar industry -- • Growth is in new apps lecture concert conversation money Net work Immediate book newspaper mail Time Shifted Data Base Its ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis & Automatic Processing are being added
Why Put Everything in Cyberspace? Point-to-Point OR Broadcast Low rent min $/byte Shrinks time now or later Shrinks space here or there Automate processing knowbots Network Immediate OR Time Delayed Locate Process Analyze Summarize Data Base
Kilo Mega Giga Tera Peta Exa Zetta Yotta Some Tera-Byte Databases • The Web: 1 TB of HTML • TerraServer 1 TB of images • Many 1 TB (file) servers • Hotmail: 7 TB of email • Sloan Digital Sky Survey: 40 TB raw, 2 TB cooked • EOS/DIS (picture of planet each week) • 15 PB by 2007 • Federal Clearing house: images of checks • 15 PB by 2006 (7 year history) • Nuclear Stockpile Stewardship Program • 10 Exabytes (???!!)
Kilo Mega Giga Tera Peta Exa Zetta Yotta A letter A novel A Movie Library of Congress (text) LoC (image) LoC (sound + cinima) All Photos All Disks All Tapes All Information!
Michael Lesk’s Pointswww.lesk.com/mlesk/ksg97/ksg.html • Soon everything can be recorded and kept • Most data will never be seen by humans • Precious Resource: Human attention Auto-Summarization Auto-Searchwill be a key enabling technology.
Outline • Why cost per transaction has dropped 100,000x in 10 years. • How does that change things? • What next (technology trends) • Clusters of Hardware and Software CyberBricks
NOW CPU: nearing 1 BIPS but CPI rising fast (2-10) so less than 100 mips 1$/mips to 10$/mips DRAM: 3 $/MB DISK: 30 $/GB TAPE: 20 GB/tape, 6 MBps Lags disk 2$/GB offline, 15$/GB nearline 2003 Forecast (10x better) CPU: 1BIPS real (smp) 0.1$ - 1$/mips DRAM: 1 Gb chip 0.1 $/MB Disk: 10 GB smart cards500GB RAID packs (NTinside) 3$ GB Tape ? Technology (hardware)
System On A Chip • Integrate Processing with memory on one chip • chip is 75% memory now • 1MB cache >> 1960 supercomputers • 256 Mb memory chip is 32 MB! • IRAM, CRAM, PIM,… projects abound • Integrate Networking with processing on one chip • system bus is a kind of network • ATM, FiberChannel, Ethernet,.. Logic on chip. • Direct IO (no intermediate bus) • Functionally specialized cards shrink to a chip.
3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive ThesisMany little beat few big $1 million $10 K $100 K Pico Processor Micro Nano 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPEC marks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multi-program cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance?
Andromeda 9 10 Tape /Optical 2,000 Years Robot 6 Pluto Disk 2 Years 10 1.5 hr Sacramento 100 Memory This Campus 10 min 10 On Board Cache This Room 2 On Chip Cache 1 Registers My Head 1 min Storage Latency: How Far Away is the Data?
Gilder’s Telecosom Law: 3x bandwidth/year for 25 more years • Today: • 10 Gbps per channel • 4 channels per fiber: 40 Gbps • 32 fibers/bundle = 1.2 Tbps/bundle • In lab 3 Tbps/fiber (400 x WDM) • In theory 25 Tbps per fiber • 1 Tbps = USA 1996 WAN bisection bandwidth 1 fiber = 25 Tbps
CHALLENGE reduce software taxon messages Today 30 K ins + 10 ins/byte Goal: 1 K ins + .01 ins/byte Best bet: SAN/VIA Smart NICs Special protocol User-Level Net IO (like disk) Technology 10 GBps bus “now” 1 Gbps links “now” 1 Tbps links in 10 years Fast & cheap switches Standard interconnects processor-processor processor-device (=processor) Deregulation WILL work someday NetworkingBIG!! Changes coming!
TCP/IP Unix/NT 100% cpu @ 40MBps Disk Unix/NT 8% cpu @ 40MBps Why the Difference? Host does TCP/IP packetizing, checksum,… flow control small buffers Host Bus Adapter does SCSI packetizing, checksum,… flow control DMA What if Networking Was as Cheap As Disk IO?
The Promise of SAN/VIA10x better in 2 years • Today: • wires are 10 MBps (100 Mbps Ethernet) • ~20 MBps tcp/ip saturates 2 cpus • round-trip latency is ~300 us • In two years • wires are 100 MBps (1 Gbps Ethernet, ServerNet,…) • tcp/ip ~ 100 MBps 10% of each processor • round-trip latency is 20 us • works in lab todayuses Winsock2 api.See http://www.viarch.org/
RIP FDDI RIP ATM RIP FC RIP SCI RIP ? RIP SCSI SAN: Standard Interconnect Gbps Ethernet: 110 MBps • LAN faster than memory bus? • 1 GBps links in lab. • 100$ port cost soon • Port is computer PCI: 70 MBps UW Scsi: 40 MBps FW scsi: 20 MBps scsi: 5 MBps
Data GravityProcessing Moves to Transducers • Move Processing to data sources • Move to where the power (and sheet metal) is • Processor in • Modem • Display • Microphones (speech recognition) & cameras (vision) • Storage: Data storage and analysis
CyberBricks:Functionally Specialized Cards P mips processor Today: P= 20 mips M= 2 MB ASIC • Storage • Network • Display M MB DRAM In a few years P= 200 mips M= 64 MB ASIC ASIC
Tera Byte Backplane With Tera Byte Interconnectand Super Computer Adapters • Processing is incidental to • Networking • Storage • UI • Disk Controller/NIC is • faster than device • close to device • Can borrow device package & power • So use idle capacity for computation. • Run app in device.
All Device Controllers will be Cray 1’s Central Processor & Memory • TODAY • Disk controller is 10 mips risc engine with 2MB DRAM • NIC is similar power • SOON • Will become 100 mips systems with 100 MB DRAM. • They are nodes in a federation(can run Oracle on NT in disk controller). • Advantages • Uniform programming model • Great tools • Security • economics (CyberBricks) • Move computation to data (minimize traffic) Tera Byte Backplane
It’s Already True of PrintersPeripheral = CyberBrick • You buy a printer • You get a • several network interfaces • A Postscript engine • cpu, • memory, • software, • a spooler (soon) • and… a print engine.
Disk = Node • has magnetic storage (100 GB?) • has processor & DRAM • has SAN attachment • has execution environment Applications Services DBMS RPC, ... File System SAN driver Disk driver OS Kernel
Outline • Why cost per transaction has dropped 100,000x in 10 years. • How does that change things? • What next (technology trends): CyberBricks • Clusters of Hardware and Software CyberBricks
People are buying computers by the dozens Computers only cost 1k$/slice! Clustering them together All God’s Children Have Clusters!Buying Computing By the Slice
It’s so natural,even mainframes cluster !Looking closer at usage patterns, a few models emerge Looking closer at sites, you see hierarchies bunches functional specialization A cluster is a cluster is a cluster
“Commercial” NT Clusters • 16-node Tandem Cluster • 64 cpus • 2 TB of disk • Decision support • 45-node Compaq Cluster • 140 cpus • 14 GB DRAM • 4 TB RAID disk • OLTP (Debit Credit) • 1 B tpd (14 k tps)
Tandem Oracle/NT • 27,383 tpmC • 71.50 $/tpmC • 4 x 6 cpus • 384 disks=2.7 TB
The Microsoft.Com Site Building 11 Staging Servers Ave CFG: 4xP5, Log Processing (7) 512 RAM, Ave CFG: 4xP6, 30 GB HD 1 GB RAM, Internal WWW European Data Center Ave Cost: $35K 180 GB HD premium.microsoft.com IDC Staging Servers www.microsoft.com FY98 Fcst: 12 Ave Cost: $128K (1) FY98 Fcst: 2 MOSWest (3) Ave CFG: 4xP6, FTP Servers 512 RAM, SQLNet Ave CFG: 4xP5, SQL SERVERS 50 GB HD Feeder LAN 512 RAM, SQL Consolidators (2) Ave Cost: $50K Router Download 30 GB HD DMZ Staging Servers FY98 Fcst: 1 Ave CFG: Replication 4xP6, Ave Cost: $28K 512 RAM, FY98 Fcst: 0 FTP Router Live SQL Servers 160 GB HD Download Server Ave Cost: $80K SQL Reporting Ave CFG: 4xP6, (1) FY98 Fcst: 1 MOSWest Switched Ave CFG: 4xP6, 512 RAM, Live SQL Server Ave CFG: Admin LAN 4xP6, Ethernet 512 RAM, 160 GB HD All servers in Building11 512 RAM, 160 GB HD Ave Cost: $83K are accessable from 50 GB HD Ave Cost: $80K FY98 Fcst: 12 corpnet. Ave Cost: $35K FY98 Fcst: 2 FY98 Fcst: 2 search.microsoft.com msid.msn.com (1) msid.msn.com register.microsoft.com www.microsoft.com (1) (1) www.microsoft.com (2) (4) Ave CFG: 4xP6, Router (4) 512 RAM, search.microsoft.com Ave CFG: 4xP6, 30 GB HD Japan Data Center (3) 512 RAM, SQL SERVERS www.microsoft.com Ave Cost: $43K 50 GB HD FY98 Fcst: 10 Ave CFG: premium.microsoft.com 4xP6, (2) (3) Ave Cost: $50K 512 RAM, Ave CFG: 4xP6, (1) FY98 Fcst: 17 Ave CFG: 4xP6, 30 GB HD home.microsoft.com 512 RAM, home.microsoft.com 512 RAM, Ave Cost: $28K 160 GB HD FDDI Ring (3) 50 GB HD FY98 Fcst: (4) 7 Ave Cost: $80K (MIS2) premium.microsoft.com Ave Cost: $50K FY98 Fcst: 1 Ave CFG: 4xP6 FY98 Fcst: 1 (2) msid.msn.com 512 RAM Ave CFG: 4xP6, activex.microsoft.com 28 GB HD 512 RAM, (1) (2) FDDI Ring Ave CFG: 4xP6, Ave Cost: $35K 30 GB HD Switched (MIS1) 512 RAM, FY98 Fcst: Ave CFG: 17 4xP6, Ave Cost: $35K Ethernet 30 GB HD 256 RAM, FY98 Fcst: 3 Ave Cost: $28K 30 GB HD FTP FY98 Fcst: 3 Ave Cost: $25K cdm.microsoft.com Download Server Ave CFG: FY98 Fcst: 4xP5, 2 (1) 256 RAM, Router (1) HTTP search.microsoft.com 12 GB HD Download Servers (2) Ave Cost: $24K (2) Router FY98 Fcst: 0 Router Internet msid.msn.com Router (1) 2 Primary 2 Router Gigaswitch OC3 Ethernet premium.microsoft.com (100Mb/Sec Each) Internet (100 Mb/Sec Each) Router (1) www.microsoft.com Router (3) Secondary Gigaswitch 13 Router DS3 Router (45 Mb/Sec Each) FDDI Ring home.microsoft.com (MIS3) www.microsoft.com msid.msn.com (2) (5) (1) Internet register.microsoft.com Ave CFG: 4xP5, FDDI Ring (2) 256 RAM, (MIS4) 20 GB HD Ave Cost: $29K register.microsoft.com home.microsoft.com FY98 Fcst: 2 support.microsoft.com (1) (5) register.msn.com (2) (2) Ave CFG: 4xP6, support.microsoft.com 512 RAM, search.microsoft.com (1) 30 GB HD (3) Ave Cost: $35K FY98 Fcst: 9 \\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd 12/15/97 Microsoft.com: ~150x4 nodes Ave CFG: 4xP6, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 1 Ave CFG: 4xP6, 1 GB RAM, 160 GB HD Ave Cost: $83K FY98 Fcst: 2 Ave CFG: 4xP6, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 1 FTP.microsoft.com (3) Ave CFG: 4xP5, 512 RAM, 30 GB HD Ave Cost: $28K FY98 Fcst: 0
Compaq AlphaServer 8400 8x400Mhz Alpha cpus 10 GB DRAM 324 9.2 GB StorageWorks Disks 3 TB raw, 2.4 TB of RAID5 STK 9710 tape robot (4 TB) WindowsNT 4 EE, SQL Server 7.0 The Microsoft TerraServer Hardware
Inktomi (hotbot), WebTV: > 200 nodes • Inktomi: ~250 UltraSparcs • web crawl • index crawled web and save index • Return search results on demand • Track Ads and click-thrus • ACID vs BASE (basic Availability, Serialized Eventually) • Web TV • ~200 UltraSparcs • Render pages, Provide Email • ~ 4 Network Appliance NFS file servers • A large Oracle app tracking customers
Loki: Pentium Clusters for Sciencehttp://loki-www.lanl.gov/ 16 Pentium Pro Processors x 5 Fast Ethernet interfaces + 2 Gbytes RAM + 50 Gbytes Disk + 2 Fast Ethernet switches + Linux…………………... = 1.2 real Gflops for $63,000 (but that is the 1996 price) Beowulf project is similar http://cesdis.gsfc.nasa.gov/pub/people/becker/beowulf.html • Scientists want cheap mips.
Intel/Sandia: 9000x1 node Ppro LLNL/IBM: 512x8 PowerPC (SP2) LNL/Cray: ? Maui Supercomputer Center 512x1 SP2 Your Tax Dollars At WorkASCI for Stockpile Stewardship
Berkeley NOW (network of workstations) Projecthttp://now.cs.berkeley.edu/ • 105 nodes • Sun UltraSparc 170, 128 MB, 2x2GB disk • Myrinet interconnect (2x160MBps per node) • SBus (30MBps) limited • GLUNIX layer above Solaris • Inktomi (HotBot search) • NAS Parallel Benchmarks • Crypto cracker • Sort 9 GB per second