460 likes | 542 Views
™. Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com. Thesis: Scaleable Servers. Scaleable Servers Commodity hardware allows new applications New applications need huge servers Clients and servers are built of the same “stuff” Commodity software and Commodity hardware
E N D
™ Scaleable ComputingJim GrayMicrosoft CorporationGray@Microsoft.com
Thesis: Scaleable Servers • Scaleable Servers • Commodity hardware allows new applications • New applications need huge servers • Clients and servers are built of the same “stuff” • Commodity software and • Commodity hardware • Servers should be able to • Scale up (grow node by adding CPUs, disks, networks) • Scale out (grow by adding nodes) • Scale down (can start small) • Key software technologies • Objects, Transactions, Clusters, Parallelism
1987: 256 tps Benchmark • 14 M$ computer (Tandem) • A dozen people • False floor, 2 rooms of machines Admin expert Hardware experts A 32 node processor array Auditor Network expert Simulate 25,600 clients Manager Performance expert OS expert DB expert A 40 GB disk array (80 drives)
1988: DB2 + CICS Mainframe65 tps • IBM 4391 • Simulated network of 800 clients • 2m$ computer • Staff of 6 to do benchmark 2 x 3725 network controllers Refrigerator-sized CPU 16 GB disk farm 4 x 8 x .5GB
1997: 10 years later1 Person and 1 box = 1250 tps • 1 Breadbox ~ 5x 1987 machine room • 23 GB is hand-held • One person does all the work • Cost/tps is 1,000x less25 micro dollars per transaction 4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk Hardware expert OS expert Net expert DB expert App expert 3 x7 x 4GB disk arrays
mainframe mini price micro time What Happened? • Moore’s law: Things get 4x better every 3 years(applies to computers, storage, and networks) • New Economics: Commodityclass price/mips software $/mips k$/yearmainframe 10,000 100 minicomputer 100 10microcomputer 10 1 • GUI: Human - computer tradeoffoptimize for people, not computers
? performance 1985 1995 2005 What Happens Next • Last 10 years: 1000x improvement • Next 10 years: ???? • Today: text and image servers are free 25 m$/hit => advertising pays for them • Future:video, audio, … servers are free“You ain’t seen nothing yet!”
Kinds Of Information Processing Point-to-point Broadcast Lecture Concert Conversation Money Network Immediate Book Newspaper Mail Time-shifted Database It’s ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis and automatic processing are being added
Why Put EverythingIn Cyberspace? Point-to-point OR broadcast Low rent - min $/byte Shrinks time - now or later Shrinks space - here or there Automate processing - knowbots Network Immediate OR time-delayed Locate Process Analyze Summarize Database
Magnetic Storage Cheaper Than Paper • File cabinet: cabinet (four drawer) 250$ paper (24,000 sheets) 250$ space (2x3 @ 10$/ft2) 180$ total 700$ 3¢/sheet • Disk: disk (4 GB =) 800$ ASCII: 2 mil pages 0.04¢/sheet (80x cheaper) • Image: 200,000 pages 0.4¢/sheet (8x cheaper) • Store everything on disk
DatabasesInformation at Your Fingertips™ Information Network™Knowledge Navigator™ • All information will be in anonline database (somewhere) • You might record everything you • Read: 10MB/day, 400 GB/lifetime(eight tapes today) • Hear: 400MB/day, 16 TB/lifetime(three tapes/year today) • See: 1MB/s, 40GB/day, 1.6 PB/lifetime (maybe someday)
People Name Address David NY Mike Berk Won Austin Database StoreALL Data Types • The old world: • Millions of objects • 100-byte objects • The new world: • Billions of objects • Big objects (1 MB) • Objects have behavior (methods) • Paperless office • Library of Congress online • All information online • Entertainment • Publishing • Business • WWW and Internet People Name Voice Address Papers Picture NY David Mike Berk Won Austin
Billions Of Clients • Every device will be “intelligent” • Doors, rooms, cars… • Computing will be ubiquitous
Billions Of ClientsNeed Millions Of Servers • All clients networked to servers • May be nomadicor on-demand • Fast clients wantfaster servers • Servers provide • Shared Data • Control • Coordination • Communication Clients Mobileclients Fixedclients Servers Server Super server
3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive ThesisMany little beat few big $1 million $10 K $100 K Pico Processor Micro Nano 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance?
CPU 50 GB Disc 5 GB RAM Future Super Server:4T Machine • Array of 1,000 4B machines • 1 bps processors • 1 BB DRAM • 10 BB disks • 1 Bbps comm lines • 1 TB tape robot • A few megabucks • Challenge: • Manageability • Programmability • Security • Availability • Scaleability • Affordability • As easy as a single system Cyber Brick a 4B machine Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work
The Hardware Is In Place…And then a miracle occurs ? • SNAP: scaleable networkand platforms • Commodity-distributedOS built on: • Commodity platforms • Commodity networkinterconnect • Enables parallel applications
Thesis: Scaleable Servers • Scaleable Servers • Commodity hardware allows new applications • New applications need huge servers • Clients and servers are built of the same “stuff” • Commodity software and • Commodity hardware • Servers should be able to • Scale up (grow node by adding CPUs, disks, networks) • Scale out (grow by adding nodes) • Scale down (can start small) • Key software technologies • Objects, Transactions, Clusters, Parallelism
Scaleable ServersBOTH SMP And Cluster Grow up with SMP; 4xP6is now standard Grow out with cluster Cluster has inexpensive parts SMP superserver Departmentalserver Personalsystem Clusterof PCs
SMPs Have Advantages • Single system image easier to manage, easier to program threads in shared memory, disk, Net • 4x SMP is commodity • Software capable of 16x • Problems: • >4 not commodity • Scale-down problem (starter systems expensive) • There is a BIGGEST one SMP superserver Departmentalserver Personalsystem
1-TB home page www.SQL.1TB.com Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la TM 1-TB SQL Server DBSatellite and aerial photos Supportfiles Building the Largest Node • There is a biggest node (size grows over time) • Today, with NT, it is probably 1TB • We are building it(with help from DEC and SPIN2) • 1 TB GeoSpatial SQL Server database • (1.4 TB of disks = 320 drives). • 30K BTU, 8 KVA, 1.5 metric tons. • Will put it on the Web as a demo app. • 10 meter image of the ENTIRE PLANET. • 2 meter image of interesting parts (2% of land)One pixel per meter = 500 TB uncompressed. • Better resolution in US (courtesy of USGS).
What’s TeraByte? • 1 Terabyte: 1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 15 miles of book shelf 50,000,000 FAX images 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 4,000 LandSat images 16 earth images (100m) 100,000,000 web page 10 copies of the web HTML • Library of Congress (in ASCII) is 25 TB 1980: $200 million of disc 10,000 discs $5 million of tape silo 10,000 tapes 1997: $200 k$ of magnetic disc 48 discs $30 k$ nearline tape 20 tapes Terror Byte !
TB DB User Interface + + + Next
Tpc-C Web-Based Benchmarks • Client is a Web browser (7,500 of them!) • Submits • Order • Invoice • Query to server via Web page interface • Web server translates to DB • SQL does DB work • Net: • easy to implement • performance is GREAT! HTTP IIS = Web ODBC SQL
SMP superserver Departmentalserver Personalsystem Grow UP and OUT 1 Terabyte DB • Cluster: • a collection of nodes • as easy to program and manage as a single node 1 billion transactions per day
Clusters Have Advantages • Clients and servers made from the same stuff • Inexpensive: • Built with commodity components • Fault tolerance: • Spare modules mask failures • Modular growth • Grow by adding small modules • Unlimited growth: no biggest one
Windows NT Clusters • Microsoft & 60 vendors defining NT clusters • Almost all big hardware and software vendors involved • No special hardware needed - but it may help • Fault-tolerant first, scaleable second • Microsoft, Oracle, SAP giving demos today • Enables • Commodity fault-tolerance • Commodity parallelism (data mining, virtual reality…) • Also great for workgroups!
Billion Transactions per DayProject • Building a 20-node Windows NT Cluster (with help from Intel)> 800 disks • All commodity parts • Using SQL Server & DTC distributed transactions • Each node has 1/20 th of the DB • Each node does 1/20 th of the work • 15% of the transactions are “distributed”
How Much Is 1 Billion Transactions Per Day? • 1 Btpd = 11,574 tps (transactions per second)~ 700,000 tpm (transactions/minute) • AT&T • 185 million calls (peak day worldwide) • Visa ~20 M tpd • 400 M customers • 250,000 ATMs worldwide • 7 billion transactions / year (card+cheque) in 1994 Millions of transactions per day 1,000. 100. 10. Mtpd 1. 0.1 AT&T Visa BofA NYSE 1 Btpd
ParallelismThe OTHER aspect of clusters • Clusters of machines allow two kinds of parallelism • Many little jobs: online transaction processing • TPC-A, B, C… • A few big jobs: data search and analysis • TPC-D, DSS, OLAP • Both give automatic parallelism
Kinds of Parallel Execution Any Any Sequential Sequential Pipeline Program Program Partition outputs split N ways inputs merge M ways Any Any Sequential Sequential Program Program Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Partitioned Execution Spreads computation and IO among processors Partitioned data gives NATURAL parallelism Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
N x M way Parallelism N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
1,000 MIPS 32 $ 1 MIPS 1 $ .03$/MIPS The Parallel Law Of Computing • Grosch's Law: • Parallel Law: • Needs: • Linear speedup and linear scale-up • Not always possible 2x $ is 4x performance 2x $ is2x performance 1,000 MIPS 1,000 $ 1 MIPS 1 $
Thesis: Scaleable Servers • Scaleable Servers • Commodity hardware allows new applications • New applications need huge servers • Clients and servers are built of the same “stuff” • Commodity software and • Commodity hardware • Servers should be able to • Scale up (grow node by adding CPUs, disks, networks) • Scale out (grow by adding nodes) • Scale down (can start small) • Key software technologies • Objects, Transactions, Clusters, Parallelism
The BIG PictureComponents and transactions • Software modules are objects • Object Request Broker (a.k.a., Transaction Processing Monitor) connects objects(clients to servers) • Standard interfaces allow software plug-ins • Transaction ties execution of a “job” into an atomic unit: all-or-nothing, durable, isolated Object Request Broker
Linking And EmbeddingObjects are data modules;transactions are execution modules • Link: pointer to object somewhere else • Think URL in Internet • Embed: bytesare here • Objects may be active; can callback to subscribers
Database Spreadsheet Photos Mail Map Document Objects Meet DatabasesThe basis for universaldata servers, access, & integration • object-oriented (COM oriented) programming interface to data • Breaks DBMS into components • Anything can be a data source • Optimization/navigation “on top of” other data sources • A way to componentized a DBMS • Makes an RDBMS and O-RDBMS (assumes optimizer understands objects) DBMS engine
Web Client HTML VB Java plug-ins VBscritpt JavaScrpt Middleware ORB TP Monitor Web Server... Object server Pool VB or Java Script Engine VB or Java Virt Machine HTTP+ DCOM ORB Internet DCOM (oleDB, ODBC,...) LU6.2 Legacy Gateways IBM The Three Tiers Object & Data server.
Server Side ObjectsEasy Server-Side Execution • Give simple execution environment • Object gets • start • invoke • shutdown • Everything else is automatic • Drag & Drop Business Objects A Server Network Receiver Queue Management Connections Security Context Configuration Thread Pool Service logic Synchronization Shared Data
A new programming paradigm • Develop object on the desktop • Better yet: download them from the Net • Script work flows as method invocations • All on desktop • Then, move work flows and objects to server(s) • Gives • desktop development • three-tier deployment • Software Cyberbricks
Transactions Coordinate Components (ACID) • Transaction properties • Atomic: all or nothing • Consistent: old and new values • Isolated: automatic locking or versioning • Durable: once committed, effects survive • Transactions are built into modern OSs • MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC
Transactions & Objects • Application requests transaction identifier (XID) • XID flows with method invocations • Object Managers join (enlist)in transaction • Distributed Transaction Manager coordinates commit/abort
Distributed Transactions Enable Huge Throughput • Each node capable of 7 KtmpC (7,000 active users!) • Can add nodes to cluster (to support 100,000 users) • Transactions coordinate nodes • ORB / TP monitor spreads work among nodes
Distributed Transactions Enable Huge DBs • Distributed database technology spreads data among nodes • Transaction processing technology manages nodes
Thesis: Scaleable Servers • Scaleable Servers Built from Cyberbricks • Allow new applications • Servers should be able to • Scale up, out, down • Key software technologies • Clusters (ties the hardware together) • Parallelism: (uses the independent cpus, stores, wires • Objects (software CyberBricks) • Transactions: masks errors.