1.12k likes | 1.34k Views
FT NT: A Tutorial on Microsoft Cluster Server ™ (formerly “Wolfpack”). Joe Barrera Jim Gray Microsoft Research {joebar, gray} @ microsoft.com http://research.microsoft.com/barc. Outline. Why FT and Why Clusters Cluster Abstractions Cluster Architecture Cluster Implementation
E N D
FT NT: A Tutorial on Microsoft Cluster Server™(formerly “Wolfpack”) Joe Barrera Jim Gray Microsoft Research {joebar, gray} @ microsoft.com http://research.microsoft.com/barc
Outline • Why FT and Why Clusters • Cluster Abstractions • Cluster Architecture • Cluster Implementation • Application Support • Q&A
DEPENDABILITY: The 3 ITIES • RELIABILITY / INTEGRITY: Does the right thing.(also large MTTF) • AVAILABILITY: Does it now. (also small MTTR ) MTTF+MTTRSystem Availability:If 90% of terminals up & 99% of DB up?(=>89% of transactions are serviced on time). • Holistic vs. Reductionist view Security Integrity Reliability Availability
Case Study - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe). Vendor Vendor (hardware and software) 5 Months Application software 9 Months Communications lines 1.5 Years Operations 2 Years Environment 2 Years 10 Weeks 1,383 institutions reported (6/84 - 7/85) 7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES To Get 10 Year MTTF, Must Attack All These Areas 4 2 % Tele Comm lines 1 2 % 1 1 . 2 Environment % 2 5 % Application Software 9 . 3 % Operations
Case Studies - Tandem Trends MTTF improved Shift from Hardware & Maintenance to from 50% to 10% to Software (62%) & Operations (15%) NOTE: Systematic under-reporting of Environment Operations errors Application Software
Summary of FT Studies • Current Situation: ~4-year MTTF => Fault Tolerance Works. • Hardware is GREAT (maintenance and MTTF). • Software masks most hardware faults. • Many hidden software outages in operations: • New Software. • Utilities. • Must make all software ONLINE. • Software seems to define a 30-year MTTF ceiling. • Reasonable Goal: 100-year MTTF. class 4 today=>class 6tomorrow.
Fault Tolerance vs Disaster Tolerance • Fault-Tolerance: mask local faults • RAID disks • Uninterruptible Power Supplies • Cluster Failover • Disaster Tolerance: masks site failures • Protects against fire, flood, sabotage,.. • Redundant system and service at remote site.
The Microsoft “Vision”: Plug & Play Dependability • Transactions for reliability • Clusters: for availability • Security • All built into the OS Integrity Security Integrity / Reliability Availability
Cluster Goals • Manageability • Manage nodes as a single system • Perform server maintenance without affecting users • Mask faults, so repair is non-disruptive • Availability • Restart failed applications & servers • un-availability ~ MTTR / MTBF , so quick repair. • Detect/warn administrators of failures • Scalability • Add nodes for incremental • processing • storage • bandwidth
Fault Model • Failures are independentSo, single fault tolerance is a big win • Hardware fails fast (blue-screen) • Software fails-fast (or goes to sleep) • Software often repaired by reboot: • Heisenbugs • Operations tasks: major source of outage • Utility operations • Software upgrades
Client PCs Printers Server A Server B Interconnect Disk array B Disk array A Cluster: Servers Combined to Improve Availability & Scalability • Cluster: A group of independent systems working together as a single system. Clients see scalable & FT services (single system image). • Node: A server in a cluster. May be an SMP server. • Interconnect: Communications link used for intra-cluster status info such as “heartbeats”. Can be Ethernet.
Microsoft Cluster Server™ • 2-node availability Summer 97 (20,000 Beta Testers now) • Commoditize fault-tolerance (high availability) • Commodity hardware (no special hardware) • Easy to set up and manage • Lots of applications work out of the box. • 16-node scalability later (next year?)
Browser Server 1 Server 2 Failover Example Server 1 Server 2 Web site Web site Database Database Web site files Database files
MS Press Failover Demo • Client/Server • Software failure • Admin shutdown • Server failure Resource States - Pending - Partial - Failed - Offline !
Server “Alice” SMP Pentium® Pro Processors Windows NT Server with Wolfpack Microsoft Internet Information Server Microsoft SQL Server Server “Betty” SMP Pentium® Pro Processors Windows NT Server with Wolfpack Microsoft Internet Information Server Microsoft SQL Server Interconnect standard Ethernet Local Disks Local Disks Shared Disks Client Windows NT Workstation Internet Explorer MS Press OLTP app Administrator Windows NT Workstation Cluster Admin SQL Enterprise Mgr Demo Configuration SCSI Disk Cabinet Windows NT Server Cluster
Local Disks Local Disks Shared Disks • Cluster Admin Console • Windows GUI • Shows cluster resource status • Replicates status to all servers • Define apps & related resources • Define resource dependencies • Orchestrates recovery order • SQL Enterprise Mgr • Windows GUI • Shows server status • Manages many servers • Start, stop manage DBs Demo Administration Server “Alice” Runs SQL Trace Runs Globe Server “Betty” Run SQL Trace SCSI Disk Cabinet Windows NT Server Cluster Client
Generic Stateless ApplicationRotating Globe • Mplay32 is generic app. • Registered with MSCS • MSCS restarts it on failure • Move/restart ~ 2 seconds • Fail-over if • 4 failures (= process exits) • in 3 minutes • settable default
X X AVI Application AVI Application Local Disks Local Disks Shared Disks Alice Fails or Operator Requests move Demo Moving or Failing Over An Application SCSI Disk Cabinet Windows NT Server Cluster
Generic Stateful ApplicationNotePad • Notepad saves state on shared disk • Failure before save => lost changes • Failover or move (disk & state move)
Local Disks Local Disks Shared Disks Demo Step 1: Alice Delivering Service SQL Activity No SQL Activity SQL SQL ODBC ODBC SCSI Disk Cabinet IIS IIS Windows NT Server Cluster IP HTTP
No SQL Activity SQL Activity SQL SQL Local Disks Local Disks ODBC ODBC IIS IIS Shared Disks IP IP 2: Request Move to Betty SCSI Disk Cabinet Windows NT Server Cluster HTTP
No SQL Activity SQL Activity . Local Disks Local Disks Shared Disks IP 3: Betty Delivering Service SQL SQL ODBC ODBC SCSI Disk Cabinet IIS IIS Windows NT Server Cluster
No SQL Activity SQL Activity SQL SQL Local Disks Local Disks ODBC ODBC IIS Shared Disks IIS IP IP 4: Power Fail Betty, Alice Takeover SCSI Disk Cabinet Windows NT Server Cluster
Local Disks Local Disks Shared Disks 5: Alice Delivering Service SQL Activity No SQL Activity SQL ODBC SCSI Disk Cabinet IIS Windows NT Server Cluster IP HTTP
SQL Local Disks ODBC Local Disks IIS Shared Disks 6: Reboot Betty, now can takeover SQL Activity No SQL Activity SQL ODBC SCSI Disk Cabinet IIS Windows NT Server Cluster IP HTTP
Outline • Why FT and Why Clusters • Cluster Abstractions • Cluster Architecture • Cluster Implementation • Application Support • Q&A
Cluster and NT Abstractions Resource Cluster Group Cluster Abstractions NT Abstractions Service Domain Node
Basic NT Abstractions Service Domain Node • Service: program or device managed by a node • e.g., file service, print service, database server • can depend on other services (startup ordering) • can be started, stopped, paused, failed • Node:a single (tightly-coupled) NT system • hosts services; belongs to a domain • services on node always remain co-located • unit of service co-location; involved in naming services • Domain:a collection of nodes • cooperation for authentication, administration, naming
Cluster Abstractions Resource Cluster Resource Group • Resource: program or device managed by a cluster • e.g., file service, print service, database server • can depend on other resources (startup ordering) • can be online, offline, paused, failed • Resource Group:a collection of related resources • hosts resources; belongs to a cluster • unit of co-location; involved in naming resources • Cluster:a collection of nodes, resources, and groups • cooperation for authentication, administration, naming
Resources Resource Cluster Group Resources have... • Type: what it does (file, DB, print, web…) • An operational state (online/offline/failed) • Current and possiblenodes • Containing Resource Group • Dependencies on other resources • Restart parameters (in case of resource failure)
Built-in types Generic Application Generic Service Internet Information Server (IIS) Virtual Root Network Name TCP/IP Address Physical Disk FT Disk (Software RAID) Print Spooler File Share Added by others Microsoft SQL Server, Message Queues, Exchange Mail Server, Oracle, SAP R/3 Your application? (use developer kit wizard). Resource Types
Resources states: Offline:exists, not offering service Online:offering service Failed:not able to offer service Resource failure may cause: local restart other resources to gooffline resource group to move (all subject to group and resource parameters) Resource failure detected by: Polling failure Node failure Online Online Pending Failed Offline Resource States I’m Online! Go Off-line! Offline Pending I’m here! Go Online! I’m Off-line!
File Share Network Name IIS Virtual Root IP Address Resource DLL Resource Dependencies • Similar to NT Service Dependencies • Orderly startup & shutdown • A resource is brought online after any resources it depends on are online. • A Resource is taken offline before any resources it depends on • Interdependent resources • Form dependency trees • move among nodes together • failover together • as per resource group
NT Registry • Stores all configuration information • Software • Hardware • Hierarchical (name, value) map • Has a open, documented interface • Is secure • Is visible across the net (RPC interface) • Typical Entry: \Software\Microsoft\MSSQLServer\MSSQLServer\ DefaultLogin = “GUEST” DefaultDomain = “REDMOND”
Cluster Registry • Separate from local NT Registry • Replicated at each node • Algorithms explained later • Maintains configuration information: • Cluster members • Cluster resources • Resource and group parameters (e.g. restart) • Stable storage • Refreshed from “master” copy when node joins cluster
Name Restart policy (restart N times, failover…) Startup parameters Private configuration info (resource type specific) Per-node as well, if necessary Poll Intervals (LooksAlive, IsAlive, Timeout) These properties are all kept in Cluster Registry Other Resource Properties
Resource Groups Resource Cluster Group • Every resource belongs to a resource group. • Resource groups move (failover) as a unit • Dependencies NEVER cross groups. (Dependency trees contained within groups.) • Group may contain forest of dependency trees Payroll Group Web Server SQL Server IP Address Drive E: Drive F:
Group Properties • CurrentState: Online, Partially Online, Offline • Members: resources that belong to group • members determine which nodes can host group. • Preferred Owners:ordered list of host nodes • FailoverThreshold: How many faults cause failover • FailoverPeriod: Time window for failover threshold • FailbackWindowsStart: When can failback happen? • FailbackWindowEnd: When can failback happen? • Everything (except CurrentState) is stored in registry
Failover Failback Failover and Failback • Failover parameters • timeout on LooksAlive, IsAlive • # local restarts in failure window after this, offline. • Failback to preferred node • (during failback window) • Do resource failures affect group? Node \\Betty Node \\Alice Cluster Service Cluster Service IPaddr name
Cluster ConceptsClusters Resource Cluster Group Resource Group Resource Group Resource Group