490 likes | 510 Views
High Availability Software for Windows NT NeoCLUSTER. WHY : Demand for Availability WHAT : Technology and Product HOW : Configuration. Demand for Availability. Information is a capital asset of an organization.
E N D
High Availability Software for Windows NTNeoCLUSTER WHY : Demand for Availability WHAT : Technology and Product HOW : Configuration
Demand for Availability • Information is a capital asset of an organization. • The server systems for archiving, processing, and conveying information must be constantly monitored and carefully managed to provide reliable, timely, and continuous services. • Down time is inevitable • Scheduled and Unscheduled
Trends • Distributed processing and multi-tier client/server applications • Multiple servers are collaborated to improve • Load sharing • Performance • Availability • Windows NT is becoming a major server platform for mission-critical applications.
Factors of System Availability • CPU, memory, I/O cards 24% • Disk 27% • Application software 22% • Common hardware & 21% system software • Human error 6% Source : Strategic Research Division of Find SPV
System Availability Hierarchy Applications Cost Technology Hosts I/O Paths Storage Subsystem Disk & Tape
Ask Your Customers • Do you use Windows NT as the platform of your mission critical applications? • Does system downtime mean losses to you? • Do you need a technically and economically affordable solution to make your NT servers fault resilient? • Do you need a guarding angel to watch over your NT servers around the clock so that you can sleep better at night?
Level of System Availability • Non-stop systems: Stratus, Tandem, Netware SFT III • Tightly coupled, fully duplicated configuration • Proprietary OS • Non-redundant systems • Hot-plug and self-diagnostic hardware components • Auto-retry and pro-active software
Level of System Availability • High Availability systems • A cluster of loosely coupled servers • Software based implementation • Provide better availability/price ratio than non-stop systems
Cluster • Server farm : Single Network Identity • Database Cluster : Cluster Manager, Distributed Lock Manager • Computing Cluster : Parallel Computing
NeoCLUSTER • A pure software solution for building highly available server cluster • Microsoft Windows NT server standard edition version 4.0 • I386 and Alpha platforms • Functions • Cluster configuration and administration • Failure detection, logging, notification, isolation, and recovery
Features • Technically and economically affordable • Fully compatible with Windows NT • Require no software modification or proprietary hardware • No single point of failure • Reliable and efficient mechanism for error detection and fault recovery
Features • Intuitive and user friendly Windows GUI • Fully user configurable • Support automatic and manual switch back • Negligible impact on resource consumption and server performance. • Minimum human intervention • No intrusion to routine workflow
Servers • Active Server is a pre-designated computer responsible for providing critical services that will be guarded by NeoCLUSTER. • Backup Server is apre-designated computer that will takeover the active server under the administration of NeoCLUSTER. • Neither identical configured servers nor dedicated backup server is required
Private Network • Dedicated interconnect for inter-server communication. • Three types of interconnect for redundancy • TCP/IP : back to back or LAN connection of two network interface cards • RS-232 : serial cable with null modem support to connect two COM ports • Disk volume : two dedicate partitions on the shared disks
Private Network • All instances of private net were unavailable • A server can still rely on the public net to detect the availability of the peer server. • If the peer server is still available, no takeover action will be triggered. • If the peer server was unavailable, a takeover action will be activate immediately.
Public Network • Dedicated network for clients to access servers. • TCP/IP and NetBEUI protocols • Each active server will carry a switchable network ID(i.e.,IP address or computer name) • The original network IDs of both servers can remain intact. • Clients will connect to the switchable network ID. • If the active server was unavailable, the backup server will takeover the switchable network ID.
Public Network • NeoCLUSTER provides built-in mechanism to identify network failure problem. • Self-diagnostic of network availability • Supported NICs : Intel EtherExpress PRO/100B, 3Com 3C905B, DEC 21x4x. • Supported NIC add-on software : NIC Express from IPMetrics(load balancing and fault-tolerance).
Private Drives and Public Drives • Private drives are disk volumes for storing OS and the data that is not required to be accessible by the backup server. • Public drives are disk volumes on the shared disks for storing the application software and related data that must be accessible by the backup server. • Shared SCSI bus or independent host channels • Mirroring or RAID subsystems.
Clients • Computer systems that access the active servers via TCP/IP or NetBEUI protocols.
Resource Object Administration Tool Cluster Monitor Service Agent Script Cluster Service Windows NT Operation Scenario:Software Perspective • Block diagram
Active Server Backup Server Resource Object Cluster Service Cluster Service Server Heartbeat Resource Monitoring Agent Heartbeat Cluster Monitor Service Agent Operation Scenario:Software Perspective • Module interaction of NeoCLUSTER
Cluster Service and Cluster Monitor Service • The core processes of NeoCLUSTER • Two mutual-guarded NT services • user transparent auto-restart • Functions • Resource objects management • Event logging and notification • Fault isolation and recovery
Server Heartbeat • Periodic messages • Servers exchange heartbeats with each other over the private net • Inform the receiving server the availability of the sending server
Resource Object • Components of mission critical services • Repository of service related files : Volume • Switchable network identity for clients to access the services : IP Address or Computer Alias Name • The service itself : File Share, NT Services, or User Defined
Resource Object • Volume • Disk partitions on the public drives. • The drive letter mapping and partition information of a volume must be identical when viewed from both servers. This ensures that no matter which server is the active server, the volume can be accessed with the same drive letter. • NeoCLUSTER provides “volume locking” to ensure exclusive volume access.
Resource Object • IP Address • A switchable network identity for TCP/IP. • Computer Alias Name • A switchable network identity for NetBEUI. • File Share • Shared directories that are accessible by clients. • Both servers must use the same share name.
Resource Object • NT Services • Most application software for Windows NT are implemented as NT services. • User Defined • For configuring the application software that is not implemented as NT services. • For grouping related resource objects into resource hierarchy.
Resource Hierarchy • Each mission critical service is formulated and manipulated as a resource hierarchy
Resource Hierarchy • A resource hierarchy is an integrated entity. • A resource hierarchy identifies the required resource objects and the proper sequence to activate those resource objects. • A single resource object is a generic resource hierarchy.
Agents • Windows NT executable files • Availability monitoring and error detection • Intelligent and light-weighted • Least system resource consumption • Minimum impact on system performance • Efficient and reliable • No critical failure will be neglected • Real-time respond to failure to reduce downtime • No false alarm
Agents • Built-in agents • Server, public net, public drives • Resource objects • Agent API and template • Custom agent development • An open interface to communicate and interact with other programmable third party hardware and software management tools
Agent Heartbeat • Periodic messages • Agent send heartbeats to the Cluster Service to inform the Cluster Service the availability of the resource object monitored by the agent
Scripts • Windows NT executable files • Auto-initiated • Start a series of programs • Terminate a series of programs • Monitoring a series of programs • Trigger event notification programs
Administration Tool • Intuitive and user friendly • Interactive point-and-click Windows GUI • Menu-driven and form-based interface • Icon-based real-time status monitoring • Support dynamic configuration and real-time synchronization • Remote administration using Web browser is freely available from third parties
Availability Recovery • Critical factors of failover/takeover : Volume, NT Service, User Defined • Mechanisms • Failover is initiated by the active server • Takeover is initiated by the backup server • Failover/Takeover • The active server deactivate corresponding resource hierarchy • The backup server reactivate the resource hierarchy
Availability Recovery • Switch back/Fail back • Switch a resource hierarchy back to the original active server from the backup server • The original active server has recovered • The backup server detects that the active server has recovered • Retain the original load distribution • Asymmetric configuration : active/backup servers with different capacity • Symmetric configuration : two active, mutual takeover
Clients • Client-end applications will connect to switchable network IDs • No need to reconfigure or modify the client-end applications • Reconnection after a failover operation is application dependent
Clients • Stateless applications • NFS service or UDP-based applications • User transparent • Stateful applications • Client/server RDBMS applications or TCP-based applications • The client applications will loose their connection to the server • Manually reconnect to server is required
Supported Application • File Sharing • Printer Spooler • Internet Servers(FTP, WWW, etc.) • RDBMS(Microsoft, Oracle, Sybase, Informix) • Microsoft Exchange Server, Lotus Notes Server • NT Service-based application software • TCP/IP or NetBEUI-based client/server applications
Future Improvements • Multiple error notification facilities • Server side visual and audio alarm • Message broadcasting • E-mail • Pager • SNMP agent • Simplified GUI • N to 1 cluster configuration
Supported Configurations • Active/Backup
Supported Configuration • Active/Active