590 likes | 601 Views
Windows NT Scalability. Jim Gray Microsoft Research Gray@Microsoft.com http/www.research.Microsoft.com/~Gray/talks/. Outline. Scale Up. Scale Out. Scale Down. Scalability: What & Why? Scale UP: NT SMP scalability Scale OUT: NT Cluster scalability Key Message:
E N D
Windows NT Scalability Jim Gray Microsoft Research Gray@Microsoft.com http/www.research.Microsoft.com/~Gray/talks/
Outline Scale Up Scale Out Scale Down • Scalability: What & Why? • Scale UP: NT SMP scalability • Scale OUT: NT Cluster scalability • Key Message: • NT can do the most demanding apps today. • Tomorrow will be even better.
What is Scalability? Super Server Server Cluster Scale Up Server Scale Out PC Workstation Portable Win Term NetPC Scale Down Handheld TV • Grow without limits • Capacity • Throughput • Do not add complexity • design • administer • Operate • Use
Scale UP & OUT Focus Here Super Server Server Cluster Scale Up Server Scale Out • Grow without limits • SMP: 4, 8, 16, 32 CPUs • 64-bit addressing • Huge storage • Cluster Requirements • Auto manage • High availability • Transparency • Programming tools & apps
Scalability is Important Server • Automation benefits growing • ROI of 1 month.... • Slice price going to zero • Cyberbrick costs 5k$ • Design, Implement & Manage cost going down • DCOM & Viper make it easy! • NT Clusters are easy! • Billions of clients imply millions of HUGE servers. • Thin clients imply huge servers.
Q: Why Does Microsoft Care? A: Billions of clients need millions of servers 2,700 Servers Shipped per year (97-01 are MS estimates) WindowsNT Server 2,400 2,100 1,800 NetWare 1,500 1,200 900 Unix 600 300 0 1994 1995 1996 1997 1998 1999 2000 2001 Expect Microsoft to work hard on Scaleable Windows NT and Scaleable BackOffice. Key technique: INTEGRATION.
How Scaleable is NT??The Single Node Story • 64 bit file system in NT 1, 2, 3, 4, 5 • 8 node SMP in NT 4.E, 32 node OEM • 64 bit addressing in NT 5 • 1 Terabyte SQL Databases (PetaByte capable) • 10,000 users (TPC-C benchmark) • 100 Million web hits per day (IIS) • 50 GB Exchange mail store next release designed for 16 TB • 50,000 POP3 users on Exchange (1.8 M messages/day) • And, more coming…..
Windows NT ServerEnterprise Edition • Scalability • 8x SMP support (32x in OEM kit) • Larger process memory (3GB Intel) • Unlimited Virtual Roots in IIS (web) • Transactions • DCOM transactions (Viper TP mon) • Message Queuing (Falcon) • Availability • Clustering (WolfPack) • Web, File, Print,DB … servers fail over.
What Happened? mainframe mini price micro time • Moore’s law: Things get 4x better every 3 years(applies to computers, storage, and networks) • New Economics: Commodityclass price/mips software $/mips k$/yearmainframe 10,000 100 minicomputer 100 10microcomputer 10 1 • GUI: Human - computer tradeoffoptimize for people, not computers
Billions Of ClientsNeed Millions Of Servers • All clients networked to servers • May be nomadicor on-demand • Fast clients wantfaster servers • Servers provide • Shared Data • Control • Coordination • Communication Clients Mobileclients Fixedclients Servers Server Super server
ThesisMany little beat few big 3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive $1 million $10 K $100 K Pico Processor Micro Nano 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance?
Future Super Server:4T Machine CPU 50 GB Disc 5 GB RAM • Array of 1,000 4B machines • 1 bps processors • 1 BB DRAM • 10 BB disks • 1 Bbps comm lines • 1 TB tape robot • A few megabucks • Challenge: • Manageability • Programmability • Security • Availability • Scaleability • Affordability • As easy as a single system Cyber Brick a 4B machine Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work
The Hardware Is In Place…And then a miracle occurs ? • SNAP: scaleable networkand platforms • Commodity-distributedOS built on: • Commodity platforms • Commodity networkinterconnect • Enables parallel applications
Thesis: Scaleable Servers • Scaleable Servers • Commodity hardware allows new applications • New applications need huge servers • Clients and servers are built of the same “stuff” • Commodity software and • Commodity hardware • Servers should be able to • Scale up (grow node by adding CPUs, disks, networks) • Scale out (grow by adding nodes) • Scale down (can start small) • Key software technologies • Objects, Transactions, Clusters, Parallelism
Scaleable ServersBOTH SMP And Cluster Grow up with SMP; 4xP6is now standard Grow out with cluster Cluster has inexpensive parts SMP superserver Departmentalserver Personalsystem Clusterof PCs
SMPs Have Advantages • Single system image easier to manage, easier to program threads in shared memory, disk, Net • 4x SMP is commodity • Software capable of 16x • Problems: • >4 not commodity • Scale-down problem (starter systems expensive) • There is a BIGGEST one SMP superserver Departmentalserver Personalsystem
Tpc-C Web-Based Benchmarks • Client is a Web browser (9,200 of them!) • Submits • Order • Invoice • Query to server via Web page interface • Web server translates to DB • SQL does DB work • Net: • easy to implement • performance is GREAT! HTTP IIS = Web ODBC SQL
What Happens in 10 Years? • 1987: 256 tps • $ 14 million computer • A dozen people • Two rooms of machines • 1997: 1,250 tps • $ 50 k$ computer • One person • 1 micro-dollar per transaction • (1,000x cheaper) Ready for the next 10 years?
1988: DB2 + CICS Mainframe65 tps • IBM 4391 • Simulated network of 800 clients • 2m$ computer • Staff of 6 to do benchmark 2 x 3725 network controllers Refrigerator-sized CPU 16 GB disk farm 4 x 8 x .5GB
NT vs UNIX SMPs • NT traditionally ran on 1 to 4 cpus • Scales near-linear on them • UNIX boxes: 32-64 way SMPs • They do 3x more tpmC • They cost 10x more. • 10 way NT machines are available • They cost more • They are faster • My view (shared by many) • Need clusters for availability • Cluster commodity servers to make huge systems • a la Tandem, Teradata, VMScluster, IBM Sysplex, IBM SP2 • Clusters reduce need for giant SMPs
Transaction Throughput TPC-C • On comparable hardware: NT scales better! • SQL Server & NT Improving 250% per year • NT has best Price Performance (2x cheaper)
NT Scales Better Than Solaris • Microsoft SQLNTIntel scales to 6x • Beats Sybase Solaris UltraSPARCup to 11-way MS SQL/NT/Intel Sybase/Solaris/UltraSPARC
New News: WOW! HPUX-HPPA-Sybase • Sybase on HP 16x SMP scales to 40 ktpmC! • Price/Performance is flat (no diseconomy)
Low end More Competitive • premium on CPUs, disks, & Oracle
Only NT Has Economy of Scale • NT is 2x less expensive40$/tpmCvs 110$/tpmC • Only NT has economy of scale • Unix has dis-economy of scale
TPC-D Decision Support Benchmark • NT has good performance and price/performance.
Scaleup To Big Databases? Satellitephotos of Earth (1 TB) Human Genome (3GB) Dayton-Hudson Sales records (300GB) Manhattan phone book (15MB) Excelspreadsheet • NT 4 and SQL Server 6.5 • DBs up to 1 Billion records, • 100 GB • Covers most (80%) data warehouses • SQL Server 7.0 • Designed for Terabytes • Hundreds of disks per server. • SMP parallel search • Data Mining and Multi-Media • TerraServer is good MM example
Database Scaleup: TerraServer™ • Demo NT and SQL Server scalability • Stress test SQL Server 7.0 • Requirements • 1 TB • Unencumbered (put on www) • Interesting to everyone everywhere • And not offensive to anyone anywhere • Loaded • 1.1 M place names from Encarta World Atlas • 1 M Sq Km from USGS (1 meter resolution) • 2 M Sq Km from Russian Space agency (2 m) • Will be on web (world’s largest atlas) • Sell images with commerce server. • USGS CRDA: 3 TB more coming.
TerraServer System SPIN-2 • DEC Alpha 4100 (4x smp) + • 324 StorageWorks Drives (1.4 TB) • RAID 5 Protected • SQL Server 7.0 • USGS 1-meter data (30% of US) • Russian Space dataTwo meterresolutionimages(2 M km22% of earth)
Demo http://msrlab/terraserver
ManageabilityWindows NT 5.0 and Windows 98 • Active Directory tracks all objects in net • Integration with IE 4. • Web-centric user interface • Management Console • Component architecture • Zero Admin Kit and Systems Management Server • PlugNPlay, Instant On, Remote Boot,.. • Hydra and Intelli-Mirroring
Thin Client SupportTSO comes to NTlower per-client costs Dedicated Windows terminal Net PC Existing, Desktop PC MS-DOS, UNIX, Mac clients Windows NT Server with “Hydra” Server
Windows NT 5.0IntelliMirror™ • Extends CMU Coda File System ideas • Files and settings mirrored on client and server • Great for disconnected users • Facilitates roaming • Easy to replace PCs • Optimizes network performance Best of PC and centralized computing advantages
Outline Scale Up Scale Out Scale Down • Scalability: What & Why? • Scale UP: NT SMP scalability • Scale OUT: NT Cluster scalability • Key Message: • NT can do the most demanding apps today. • Tomorrow will be even better.
Scale OUTClusters Have Advantages • Fault tolerance: • Spare modules mask failures • Modular growth without limits • Grow by adding small modules • Parallel data search • Use multiple processors and disks • Clients and servers made from the same stuff • Inexpensive: built with commodity CyberBricks
How scaleable is NT??The Cluster Story • 16-node Tandem Cluster • 64 cpus • 2 TB of disk • Decision support • 45-node Compaq Cluster • 140 cpus • 14 GB DRAM • 4 TB RAID disk • OLTP (Debit Credit) • 1 B tpd (14 k tps)
microsoft.com Production Windows NT.4 and IIS.3 20 HTTP, 3 download, 3 FTP 5 SQL 6.5 Index Server + 3 search Stagers Site Server for content DCOM Publishing wizard Network 6 DS3 4 TB/day download capacity Replicas in UK and Japan 90m hits/day 17m page views #4 site on Internet 900k visitors per day Not cheap Data Centers Bandwidth 27 people on content 22 people on systems
Tandem 2 Ton 2 TB SQL database 1.2 TB user data 16 node cluster 64 cpus, 480 disks Decision support parallel data-mining Will be Wolf Pack aware Demoed at DB Expo in ServerNet™ interconnect
Billion Transactions per Day Project • Built a 45-node Windows NT Cluster (with help from Intel & Compaq)> 900 disks • All off-the-shelf parts • Using SQL Server & DTC distributed transactionsDCOM & ODBC clientson 20 front-end nodes • DebitCredit Transaction • Each server node has 1/20 th of the DB • Each server node does 1/20 th of the work • 15% of the transactions are “distributed”
Billion Transactions Per Day Hardware Type nodes CPUs DRAM ctlrs disks RAID space 20 20x 20x 20x 20x 20x Workflow Compaq MTS Proliant 2 128 1 1 2 GB 2500 20 20x 20x 20x 20x 20x Compaq 36x4.2GB SQL Server Proliant 4 512 4 7x9.1GB 130 GB 5000 Distributed 5 5x 5x 5x 5x 5x Transaction Compaq Coordinator Proliant 4 256 1 3 8 GB 5000 TOTAL 45 140 13 GB 105 895 3 TB • 45 nodes (Compaq Proliant) • Clustered with 100 Mbps Switched Ethernet • 140 cpu, 13 GB, 3 TB (RAID 1, 5).
Cluster Architecture VIPDC8 VIPDC12 VIPDC11 VIPDC10 VIPDC9 VIPDC51 VIPDC7 VIPDC6 VIPDC5 VIPDC4 VIPDC14 VIPDC16 VIPDC15 VIPDTC4 VIPDC49 VIPDC13 VIPDC47 VIPDC17 VIPDC18 VIPDC19 VIPDC48 VIPDC21 VIPDTC1 VIPDTC2 VIPDC20 VIPDTC3 VIPDTC5 VIPDC44 VIPDC45 VIPDC46 VIPDC50 Control Driver Database DTC Switch VIPDC42 VIPDC43 VIPDC2 VIPDC3
Local Debit Credit 1 2 4 Run 3 5 Init 6 7 8 9 10 Loop DebitCredit 11 DebitCredit 12 13 14 Driver Thread DebitCredit Driver DebitCredit Component Database DCOM ODBC
Distributed Debit Credit - Same DTC 18 11 21 UpdateAcct 22 23 19 12 20 13 25 14 26 15 27 16 28 17 25 26 24 27 28 29 Database1 DebitCredit Database2 DTC
Distributed Debit Credit - Different DTC 20 23 11 24 UpdateAcct 25 12 13 21 14 22 33 29 28 32 15 17 26 16 19 18 27 27 30 30 31 31 35 34 34 Database1 DebitCredit Database2 DTC1 DTC2
1.2 B tpd • 1 B tpd ran for 24 hrs. • Out-of-the-box software • Off-the-shelf hardware • AMAZING! • Sized for 30 days • Linear growth • 5 micro-dollars per transaction
How Much Is 1 Billion Tpd? • 1 billion tpd = 11,574 tps ~ 700,000 tpm (transactions/minute) • ATT • 185 million calls per peak day (worldwide) • Visa ~20 million tpd • 400 million customers • 250K ATMs worldwide • 7 billion transactions (card+cheque) in 1994 • New York Stock Exchange • 600,000 tpd • Bank of America • 20 million tpd checks cleared (more than any other bank) • 1.4 million tpd ATM transactions • Worldwide Airlines Reservations: 250 Mtpd
1 B tpd: So What? • Shows what is possible, easy to build • Grows without limits • Shows scaleup of DTC, MTS, SQL… • Shows (again) that shared-nothing clusters scale • Next task: make it easy. • auto partition data • auto partition application • auto manage & operate
ParallelismThe OTHER aspect of clusters • Clusters of machines allow two kinds of parallelism • Many little jobs: online transaction processing • TPC-A, B, C… • A few big jobs: data search and analysis • TPC-D, DSS, OLAP • Both give automatic parallelism
Kinds of Parallel Execution Any Any Sequential Sequential Pipeline Program Program Partition outputs split N ways inputs merge M ways Any Any Sequential Sequential Program Program
Data RiversSplit + Merge Streams N X M Data Streams M Consumers N producers River • Producers add records to the river, • Consumers consume records from the river • Purely sequential programming. • River does flow control and buffering • does partition and merge of data records • River = Split/Merge in Gamma = Exchange operator in Volcano.