50 likes | 294 Views
Clustering Technology For Fault Tolerance. Jim Gray Microsoft Research http://www.research.Microsoft.com/~Gray. What is Wolfpack?. A consortium of 60 HW & SW vendors (everybody who is anybody) A set of APIs for clustering and fault tolerance An enhancement to NT™ Server (in beta test )
E N D
Clustering TechnologyFor Fault Tolerance Jim Gray Microsoft Research http://www.research.Microsoft.com/~Gray
What is Wolfpack? • A consortium of 60 HW & SW vendors(everybody who is anybody) • A set of APIs for clustering and fault tolerance • An enhancement to NT™ Server (in beta test ) • Key concepts • System: a particular node • Cluster: a collection of systems working together • resource: a hardware or software module • resource dependency: one resource needs another • resource group: fails over as a unit: dependencies do not cross group boundaries
What Wolfpack Supports in V1 • two node failover (twin-tail SCSI) • Apps: • File, Print, web server, IP address, Net Name • Most of Microsoft BackOffice (SQL, Exchange, Viper, Falcon,…) • Oracle • SAP • many others • Easy to program, operate, use
Cluster Advantages • Clients and Servers made from the same stuff. • Inexpensive: Built with commodity components • Fault tolerance: • Spare modules mask failures • Modular growth • grow by adding small modules • Parallel data search • use multiple processors and disks
What Happens When a Component Fails? • Redundant disk or path: configure around it. • Non-redundant software: restart. • Non-redundant hardware: migrate software to surviving nodes. • Fault detection: 1 ms to 10 sec. • Failover .1 sec to 1 min. • This is standard in Tandem, Teradata, VMScluster