1 / 23

Scaleability Scale Up and Scale Out

Scaleability Scale Up and Scale Out. SMP. Super Server. Departmental. Server. Personal. System. Grow Up with SMP 4xP6 is now standard Grow Out with Cluster Cluster has inexpensive parts. Cluster of PCs. Thesis Many little beat few big. 3. 1 MM. 10 nano-second ram.

cainem
Download Presentation

Scaleability Scale Up and Scale Out

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ScaleabilityScale Up and Scale Out SMP Super Server Departmental Server Personal System Grow Up with SMP 4xP6 is now standard Grow Out with Cluster Cluster has inexpensive parts Cluster of PCs

  2. ThesisMany little beat few big 3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive $1 million $10 K $100 K Pico Processor Nano Micro 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multi-program cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance & Management? Gray - Microsoft @ LANL 12/17/98

  3. 4 B PC’s (1 Bips, .1GB dram, 10 GB disk 1 Gbps Net, B=G)The Bricks of Cyberspace • Cost 1,000 $ • Come with • NT • DBMS • High speed Net • System management • GUI / OOUI • Tools • Compatible with everyone else • CyberBricks Gray - Microsoft @ LANL 12/17/98

  4. Kilo Mega Giga Tera Peta Exa Zetta Yotta Computers shrink to a point • Disks 100x in 10 years 2 TB 3.5” drive • Shrink to 1” is 200GB • Disk is super computer! • This is already true of printers and “terminals” Gray - Microsoft @ LANL 12/17/98

  5. Microsoft.com: ~150x4 nodes: a crowd Building 11 Staging Servers (7) Ave CFG: 4xP6, Internal WWW Ave CFG: 4xP5, European Data Center premium.microsoft.com IDC Staging Servers 512 RAM, www.microsoft.com 30 GB HD (1) MOSWest (3) Ave CFG: 4xP6, Ave CFG: 4xP6, 512 RAM, FTP Servers 512 RAM, SQLNet 30 GB HD Ave CFG: 4xP5, SQL SERVERS 50 GB HD Feeder LAN 512 RAM, SQL Consolidators (2) Router Download 30 GB HD DMZ Staging Servers Ave CFG: Replication 4xP6, Ave CFG: 4xP6, 512 RAM, FTP Router 1 GB RAM, Live SQL Servers 160 GB HD Download Server 160 GB HD SQL Reporting Ave Cost: $83K Ave CFG: 4xP6, (1) MOSWest Switched Ave CFG: FY98 Fcst: 4xP6, 2 512 RAM, Live SQL Server Ave CFG: Admin LAN 4xP6, Ethernet 512 RAM, 160 GB HD 512 RAM, 160 GB HD Ave Cost: $83K 50 GB HD FY98 Fcst: 12 search.microsoft.com msid.msn.com (1) msid.msn.com register.microsoft.com www.microsoft.com (1) (1) www.microsoft.com (2) (4) Ave CFG: 4xP6, Router (4) 512 RAM, search.microsoft.com Ave CFG: 4xP6, 30 GB HD Japan Data Center (3) 512 RAM, SQL SERVERS www.microsoft.com 50 GB HD Ave CFG: premium.microsoft.com 4xP6, (2) (3) 512 RAM, Ave CFG: 4xP6, (1) 30 GB HD home.microsoft.com 512 RAM, Ave CFG: 4xP6, home.microsoft.com Ave CFG: 4xP6, Ave Cost: $28K 160 GB HD FDDI Ring 512 RAM, (3) 512 RAM, FY98 Fcst: (4) 7 (MIS2) 50 GB HD premium.microsoft.com 30 GB HD Ave CFG: 4xP6 (2) msid.msn.com 512 RAM Ave CFG: 4xP6, activex.microsoft.com 28 GB HD 512 RAM, (1) (2) FDDI Ring Ave CFG: 4xP6, 30 GB HD Switched (MIS1) 512 RAM, Ave CFG: 4xP6, Ethernet 30 GB HD 256 RAM, 30 GB HD FTP Ave Cost: $25K cdm.microsoft.com Download Server Ave CFG: FY98 Fcst: 4xP5, 2 (1) 256 RAM, Router (1) HTTP search.microsoft.com 12 GB HD Download Servers (2) (2) Router Router Internet msid.msn.com Router (1) 2 Primary 2 Router Gigaswitch OC3 Ethernet premium.microsoft.com (100Mb/Sec Each) Internet (100 Mb/Sec Each) Router (1) www.microsoft.com Router (3) Secondary Gigaswitch 13 Router DS3 Router FTP.microsoft.com (45 Mb/Sec Each) (3) FDDI Ring Ave CFG: 4xP5, home.microsoft.com (MIS3) www.microsoft.com msid.msn.com 512 RAM, (2) 30 GB HD (5) (1) Internet register.microsoft.com Ave CFG: 4xP5, FDDI Ring (2) 256 RAM, (MIS4) 20 GB HD register.microsoft.com home.microsoft.com support.microsoft.com (1) (5) register.msn.com (2) (2) Ave CFG: 4xP6, support.microsoft.com 512 RAM, search.microsoft.com (1) 30 GB HD (3) Gray - Microsoft @ LANL 12/17/98

  6. HotMail: ~400 Computers Crowd Gray - Microsoft @ LANL 12/17/98

  7. DB Clusters (crowds) • 16-node Cluster • 64 cpus • 2 TB of disk • Decision support • 45-node Cluster • 140 cpus • 14 GB DRAM • 4 TB RAID disk • OLTP (Debit Credit) • 1 B tpd (14 k tps) Gray - Microsoft @ LANL 12/17/98

  8. Windows NT Versus UNIXBest Results on an SMP: SemiLog plot shows 3x (2 year) lead by UNIX Does not show Oracle/Alpha Cluster at 100,000 tpmCAll these numbers are off-scale huge (20,000 active users?) Gray - Microsoft @ LANL 12/17/98

  9. Bottleneck Analysis • Drawn to linear scale Theoretical Bus Bandwidth 422MBps = 66 Mhz x 64 bits MemoryRead/Write ~150 MBps MemCopy ~50 MBps Disk R/W ~9MBps Gray - Microsoft @ LANL 12/17/98

  10. Bottleneck Analysis Adapter ~70 MBps PCI ~110 MBps Adapter Memory Read/Write ~250 MBps Adapter PCI Adapter • NTFS Read/Write • 18 Ultra 3 SCSI on 4 strings (2x4 and 2x5) 3 PCI 64 ~ 155 MBps Unbuffered read (175 raw) ~ 95 MBps Unbuffered write Good, but 10x down from our UNIX brethren (SGI, SUN) 155 MBps Gray - Microsoft @ LANL 12/17/98

  11. Sandia/Compaq/ServerNet/NT Sort • Sort 1.1 Terabyte (13 Billion records) in 47 minutes • 68 nodes (dual 450 Mhz processors)543 disks, 1.5 M$ • 1.2 GBps network rap (2.8 GBps pap) • 5.2 GBps of disk rap (same as pap) • (rap=real application performance,pap= peak advertised performance) Gray - Microsoft @ LANL 12/17/98

  12. Progress on Sorting: NT now leads both price and performance • Speedup comes from Moore’s law 40%/year • Processor/Disk/Network arrays: 60%/year (this is a software speedup). Gray - Microsoft @ LANL 12/17/98

  13. Compaq AlphaServer 8400 8x400Mhz Alpha cpus 10 GB DRAM 324 9.2 GB StorageWorks Disks 3 TB raw, 2.4 TB of RAID5 STK 9710 tape robot (4 TB) WindowsNT 4 EE, SQL Server 7.0 The Microsoft TerraServer Hardware Gray - Microsoft @ LANL 12/17/98

  14. TerraServer: Lots of Web Hits 35 Total Average Peak 71 30 Hits 1,065 m 8.1 m 29 m 25 Queries 877 m 6.7 m 18 m Sessions 20 Hit Count Page View Images DB Query 742 m 5.6m 15 m 15 Image Page Views 170 m 1.3 m 6.6 m 10 Users 76 k 6.4 m 48 k 5 Sessions 10 m 77 k 125 k 0 7/6/98 8/3/98 9/7/98 6/22/98 6/29/98 7/13/98 7/20/98 7/27/98 8/10/98 8/17/98 8/24/98 8/31/98 9/14/98 9/21/98 9/28/98 10/5/98 10/12/98 10/19/98 10/26/98 Date • A billion web hits! • 1 TB, largest SQL DB on the Web • 100 Qps average, 1,000 Qps peak • 877 M SQL queries so far Gray - Microsoft @ LANL 12/17/98

  15. SQL 7 TerraServer Availability • Operating for 4 months: 3,133 hrs • Unscheduled outage: 36.5 minutes: 99.98% scheduled up • Scheduled outage: 60 minutes • Availability: 99.95% overall up • No NT failures (ever) • One SQL7 Beta2 bug • No failures in Aug, Oct Gray - Microsoft @ LANL 12/17/98

  16. Backup / Restore Gray - Microsoft @ LANL 12/17/98

  17. NCSA Super Cluster • National Center for Supercomputing ApplicationsUniversity of Illinois @ Urbana • 512 Pentium II cpus, 2,096 disks, SAN • Compaq + HP +Myricom + WindowsNT • A Super Computer for 3M$ • Classic Fortran/MPI programming • DCOM programming model http://access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html Gray - Microsoft @ LANL 12/17/98

  18. Data Rivers: Split + Merge Streams N X M Data Streams M Consumers N producers River • Producers add records to the river, • Consumers consume records from the river • Purely sequential programming. • River does flow control and buffering • does partition and merge of data records • River = Split/Merge in Gamma = Exchange operator in Volcano /SQL Server. Gray - Microsoft @ LANL 12/17/98

  19. Generalization: Object-oriented Rivers • Rivers transport sub-class of record-set (= stream of objects) • record type and partitioning are part of subclass • Node transformers are data pumps • an object with river inputs and outputs • do late-binding to record-type • Programming becomes data flow programming • specify the pipelines • Compiler/Scheduler does data partitioning and “transformer” placement Gray - Microsoft @ LANL 12/17/98

  20. NT Cluster Sort as a Prototype • Using • data generation and • sort as a prototypical app • “Hello world” of distributed processing • goal: easy install & execute Gray - Microsoft @ LANL 12/17/98

  21. Remote Install • Add Registry entry to each remote node. RegConnectRegistry() RegCreateKeyEx() Gray - Microsoft @ LANL 12/17/98

  22. Cluster StartupExecution MULT_QI COSERVERINFO HANDLE HANDLE HANDLE Sort() Sort() Sort() • Setup : • MULTI_QI struct • COSERVERINFO struct • CoCreateInstanceEx() • Retrieve remote object handle • from MULTI_QI struct • Invoke methods as usual Gray - Microsoft @ LANL 12/17/98

  23. Cluster Sort Conceptual Model AAA AAA AAA AAA AAA AAA BBB BBB BBB BBB BBB BBB CCC CCC CCC CCC CCC CCC • Multiple Data Sources • Multiple Data Destinations • Multiple nodes • Disks -> Sockets -> Disk -> Disk A AAA BBB CCC B C AAA BBB CCC AAA BBB CCC Gray - Microsoft @ LANL 12/17/98

More Related