1 / 112

FT NT: A Tutorial on Microsoft Cluster Server ™ (formerly “Wolfpack”)

FT NT: A Tutorial on Microsoft Cluster Server ™ (formerly “Wolfpack”). Joe Barrera Jim Gray Microsoft Research {joebar, gray} @ microsoft.com http://research.microsoft.com/barc. Outline. Why FT and Why Clusters Cluster Abstractions Cluster Architecture Cluster Implementation

lore
Download Presentation

FT NT: A Tutorial on Microsoft Cluster Server ™ (formerly “Wolfpack”)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FT NT: A Tutorial on Microsoft Cluster Server™(formerly “Wolfpack”) Joe Barrera Jim Gray Microsoft Research {joebar, gray} @ microsoft.com http://research.microsoft.com/barc

  2. Outline • Why FT and Why Clusters • Cluster Abstractions • Cluster Architecture • Cluster Implementation • Application Support • Q&A

  3. DEPENDABILITY: The 3 ITIES • RELIABILITY / INTEGRITY: Does the right thing.(also large MTTF) • AVAILABILITY: Does it now. (also small MTTR ) MTTF+MTTRSystem Availability:If 90% of terminals up & 99% of DB up?(=>89% of transactions are serviced on time). • Holistic vs. Reductionist view Security Integrity Reliability Availability

  4. Case Study - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe). Vendor Vendor (hardware and software) 5 Months Application software 9 Months Communications lines 1.5 Years Operations 2 Years Environment 2 Years 10 Weeks 1,383 institutions reported (6/84 - 7/85) 7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES To Get 10 Year MTTF, Must Attack All These Areas 4 2 % Tele Comm lines 1 2 % 1 1 . 2 Environment % 2 5 % Application Software 9 . 3 % Operations

  5. Case Studies - Tandem Trends MTTF improved Shift from Hardware & Maintenance to from 50% to 10% to Software (62%) & Operations (15%) NOTE: Systematic under-reporting of Environment Operations errors Application Software

  6. Summary of FT Studies • Current Situation: ~4-year MTTF => Fault Tolerance Works. • Hardware is GREAT (maintenance and MTTF). • Software masks most hardware faults. • Many hidden software outages in operations: • New Software. • Utilities. • Must make all software ONLINE. • Software seems to define a 30-year MTTF ceiling. • Reasonable Goal: 100-year MTTF. class 4 today=>class 6tomorrow.

  7. Fault Tolerance vs Disaster Tolerance • Fault-Tolerance: mask local faults • RAID disks • Uninterruptible Power Supplies • Cluster Failover • Disaster Tolerance: masks site failures • Protects against fire, flood, sabotage,.. • Redundant system and service at remote site.

  8. The Microsoft “Vision”: Plug & Play Dependability • Transactions for reliability • Clusters: for availability • Security • All built into the OS Integrity Security Integrity / Reliability Availability

  9. Cluster Goals • Manageability • Manage nodes as a single system • Perform server maintenance without affecting users • Mask faults, so repair is non-disruptive • Availability • Restart failed applications & servers • un-availability ~ MTTR / MTBF , so quick repair. • Detect/warn administrators of failures • Scalability • Add nodes for incremental • processing • storage • bandwidth

  10. Fault Model • Failures are independentSo, single fault tolerance is a big win • Hardware fails fast (blue-screen) • Software fails-fast (or goes to sleep) • Software often repaired by reboot: • Heisenbugs • Operations tasks: major source of outage • Utility operations • Software upgrades

  11. Client PCs Printers Server A Server B Interconnect Disk array B Disk array A Cluster: Servers Combined to Improve Availability & Scalability • Cluster: A group of independent systems working together as a single system. Clients see scalable & FT services (single system image). • Node: A server in a cluster. May be an SMP server. • Interconnect: Communications link used for intra-cluster status info such as “heartbeats”. Can be Ethernet.

  12. Microsoft Cluster Server™ • 2-node availability Summer 97 (20,000 Beta Testers now) • Commoditize fault-tolerance (high availability) • Commodity hardware (no special hardware) • Easy to set up and manage • Lots of applications work out of the box. • 16-node scalability later (next year?)

  13. Browser Server 1 Server 2 Failover Example Server 1 Server 2 Web site Web site Database Database Web site files Database files

  14. MS Press Failover Demo • Client/Server • Software failure • Admin shutdown • Server failure Resource States - Pending - Partial - Failed - Offline !

  15. Server “Alice” SMP Pentium® Pro Processors Windows NT Server with Wolfpack Microsoft Internet Information Server Microsoft SQL Server Server “Betty” SMP Pentium® Pro Processors Windows NT Server with Wolfpack Microsoft Internet Information Server Microsoft SQL Server Interconnect standard Ethernet Local Disks Local Disks Shared Disks Client Windows NT Workstation Internet Explorer MS Press OLTP app Administrator Windows NT Workstation Cluster Admin SQL Enterprise Mgr Demo Configuration SCSI Disk Cabinet Windows NT Server Cluster

  16. Local Disks Local Disks Shared Disks • Cluster Admin Console • Windows GUI • Shows cluster resource status • Replicates status to all servers • Define apps & related resources • Define resource dependencies • Orchestrates recovery order • SQL Enterprise Mgr • Windows GUI • Shows server status • Manages many servers • Start, stop manage DBs Demo Administration Server “Alice” Runs SQL Trace Runs Globe Server “Betty” Run SQL Trace SCSI Disk Cabinet Windows NT Server Cluster Client

  17. Generic Stateless ApplicationRotating Globe • Mplay32 is generic app. • Registered with MSCS • MSCS restarts it on failure • Move/restart ~ 2 seconds • Fail-over if • 4 failures (= process exits) • in 3 minutes • settable default

  18. X X AVI Application AVI Application Local Disks Local Disks Shared Disks Alice Fails or Operator Requests move Demo Moving or Failing Over An Application SCSI Disk Cabinet Windows NT Server Cluster

  19. Generic Stateful ApplicationNotePad • Notepad saves state on shared disk • Failure before save => lost changes • Failover or move (disk & state move)

  20. Local Disks Local Disks Shared Disks Demo Step 1: Alice Delivering Service SQL Activity No SQL Activity SQL SQL ODBC ODBC SCSI Disk Cabinet IIS IIS Windows NT Server Cluster IP HTTP

  21. No SQL Activity SQL Activity SQL SQL Local Disks Local Disks ODBC ODBC IIS IIS Shared Disks IP IP 2: Request Move to Betty SCSI Disk Cabinet Windows NT Server Cluster HTTP

  22. No SQL Activity SQL Activity . Local Disks Local Disks Shared Disks IP 3: Betty Delivering Service SQL SQL ODBC ODBC SCSI Disk Cabinet IIS IIS Windows NT Server Cluster

  23. No SQL Activity SQL Activity SQL SQL Local Disks Local Disks ODBC ODBC IIS Shared Disks IIS IP IP 4: Power Fail Betty, Alice Takeover SCSI Disk Cabinet Windows NT Server Cluster

  24. Local Disks Local Disks Shared Disks 5: Alice Delivering Service SQL Activity No SQL Activity SQL ODBC SCSI Disk Cabinet IIS Windows NT Server Cluster IP HTTP

  25. SQL Local Disks ODBC Local Disks IIS Shared Disks 6: Reboot Betty, now can takeover SQL Activity No SQL Activity SQL ODBC SCSI Disk Cabinet IIS Windows NT Server Cluster IP HTTP

  26. Outline • Why FT and Why Clusters • Cluster Abstractions • Cluster Architecture • Cluster Implementation • Application Support • Q&A

  27. Cluster and NT Abstractions Resource Cluster Group Cluster Abstractions NT Abstractions Service Domain Node

  28. Basic NT Abstractions Service Domain Node • Service: program or device managed by a node • e.g., file service, print service, database server • can depend on other services (startup ordering) • can be started, stopped, paused, failed • Node:a single (tightly-coupled) NT system • hosts services; belongs to a domain • services on node always remain co-located • unit of service co-location; involved in naming services • Domain:a collection of nodes • cooperation for authentication, administration, naming

  29. Cluster Abstractions Resource Cluster Resource Group • Resource: program or device managed by a cluster • e.g., file service, print service, database server • can depend on other resources (startup ordering) • can be online, offline, paused, failed • Resource Group:a collection of related resources • hosts resources; belongs to a cluster • unit of co-location; involved in naming resources • Cluster:a collection of nodes, resources, and groups • cooperation for authentication, administration, naming

  30. Resources Resource Cluster Group Resources have... • Type: what it does (file, DB, print, web…) • An operational state (online/offline/failed) • Current and possiblenodes • Containing Resource Group • Dependencies on other resources • Restart parameters (in case of resource failure)

  31. Built-in types Generic Application Generic Service Internet Information Server (IIS) Virtual Root Network Name TCP/IP Address Physical Disk FT Disk (Software RAID) Print Spooler File Share Added by others Microsoft SQL Server, Message Queues, Exchange Mail Server, Oracle, SAP R/3 Your application? (use developer kit wizard). Resource Types

  32. Physical Disk

  33. TCP/IP Address

  34. Network Name

  35. File Share

  36. IIS (WWW/FTP) Server

  37. Print Spooler

  38. Resources states: Offline:exists, not offering service Online:offering service Failed:not able to offer service Resource failure may cause: local restart other resources to gooffline resource group to move (all subject to group and resource parameters) Resource failure detected by: Polling failure Node failure Online Online Pending Failed Offline Resource States I’m Online! Go Off-line! Offline Pending I’m here! Go Online! I’m Off-line!

  39. File Share Network Name IIS Virtual Root IP Address Resource DLL Resource Dependencies • Similar to NT Service Dependencies • Orderly startup & shutdown • A resource is brought online after any resources it depends on are online. • A Resource is taken offline before any resources it depends on • Interdependent resources • Form dependency trees • move among nodes together • failover together • as per resource group

  40. Dependencies Tab

  41. NT Registry • Stores all configuration information • Software • Hardware • Hierarchical (name, value) map • Has a open, documented interface • Is secure • Is visible across the net (RPC interface) • Typical Entry: \Software\Microsoft\MSSQLServer\MSSQLServer\ DefaultLogin = “GUEST” DefaultDomain = “REDMOND”

  42. Cluster Registry • Separate from local NT Registry • Replicated at each node • Algorithms explained later • Maintains configuration information: • Cluster members • Cluster resources • Resource and group parameters (e.g. restart) • Stable storage • Refreshed from “master” copy when node joins cluster

  43. Name Restart policy (restart N times, failover…) Startup parameters Private configuration info (resource type specific) Per-node as well, if necessary Poll Intervals (LooksAlive, IsAlive, Timeout) These properties are all kept in Cluster Registry Other Resource Properties

  44. General Resource Tab

  45. Advanced Resource Tab

  46. Resource Groups Resource Cluster Group • Every resource belongs to a resource group. • Resource groups move (failover) as a unit • Dependencies NEVER cross groups. (Dependency trees contained within groups.) • Group may contain forest of dependency trees Payroll Group Web Server SQL Server IP Address Drive E: Drive F:

  47. Moving a Resource Group

  48. Group Properties • CurrentState: Online, Partially Online, Offline • Members: resources that belong to group • members determine which nodes can host group. • Preferred Owners:ordered list of host nodes • FailoverThreshold: How many faults cause failover • FailoverPeriod: Time window for failover threshold • FailbackWindowsStart: When can failback happen? • FailbackWindowEnd: When can failback happen? • Everything (except CurrentState) is stored in registry

  49. Failover Failback Failover and Failback • Failover parameters • timeout on LooksAlive, IsAlive • # local restarts in failure window after this, offline. • Failback to preferred node • (during failback window) • Do resource failures affect group? Node \\Betty Node \\Alice Cluster Service Cluster Service IPaddr name

  50. Cluster ConceptsClusters Resource Cluster Group Resource Group Resource Group Resource Group

More Related