130 likes | 484 Views
High Availability 24 hours a day, 7 days a week, 365 days a year…. Vik Nagjee Product Manager, Core Technologies InterSystems Corporation. Topics. What is High Availability (HA)? Current HA strategies What’s coming? Questions & Discussion. What is High Availability (HA)?. Reliability
E N D
High Availability24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation
Topics • What is High Availability (HA)? • Current HA strategies • What’s coming? • Questions & Discussion
What is High Availability (HA)? • Reliability • Fault-tolerance • High Uptime • Operational Continuity • Redundancy • Minimal Disruption
High Availability vs. Disaster Recovery • High Availability = fault detection & correction procedures to maximize availability of critical services and applications, often in an automated fashion. • Disaster Recovery = process of preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. High Availability ≠ Disaster Recovery!
Current HA Strategies • Failover = Automatic switch to redundant system • Uses some type of heartbeat software (e.g., HACMP) • Current Failover Options: • Failover Clusters • Concurrent Clusters • ECP Clusters • With Failover Cluster for Database • With Concurrent Cluster for Database
Failover Clusters • One active system (PROD), and one standby system (STDBY), with a heartbeat connection • Windows Cluster, IBM HACMP, Sun Cluster, HP Serviceguard, Red Hat Cluster Suite, Veritas Cluster Services… • Needs shared disk for install directory, WIJ, database files, and journal files • Users/Applications connect to a DNS which is mapped to PROD • In event of failure, 3rd party cluster software fails Caché to STDBY node • Caché performs recovery on STDBY node before allowing connections - open Tx’s are rolled back, open locks are released, etc…
Concurrent Clusters • AKA Caché Clusters • Can be configured on OpenVMS and Tru64 UNIX • Two or more servers, each running an instance of Caché and each with access to all disks, concurrently provide access to all data • Users connect to either one of the clustered nodes; Caché provides data and lock synchronization across nodes • If one machine fails, users can immediately reconnect to any of the remaining cluster nodes • Caché performs cluster-wide recovery during failover – logical and physical data integrity is maintained
ECP Clusters – with DB as Failover Cluster • Enterprise Cache Protocol (ECP) provides a distributed, tiered system • Typical configuration: • N+1 application servers • Users load-balanced across app servers • If any app server goes down, users can be reconnected to other remaining app servers • If database goes down, users on app servers will experience pause while DB failover completes (here DB is configured as a failover cluster) • Application servers will reconnect after database has performed recovery
ECP Clusters – with DB as Concurrent Cluster • Similar to previous example, except DB server is configured as a concurrent cluster (OpenVMS or Tru64 UNIX) • App servers can connect to any one of the nodes • If any node fails, the app server(s) connected to that node will reconnect to another surviving node after failover • Caché performs cluster-wide recovery during failover – logical and physical data integrity is maintained
High Availability: What’s Coming? Database Mirroring: • Delivers faster, automated failover • Eliminates requirement for shared disk configurations • Reduces dependency on 3rd party clustering software • Uses multiple redundant servers • Integrated ECP recovery
Database Mirroring • Multiple servers in Mirror Set - one is Primary, others are Backup (1+) • TCP connections between mirror members • Primary PUSHES journal updates to Backups, who ack and continuously de-journal • Primary role can flip from one server to another within moments – automated failover • All clients (except ECP) connect to a Mirror Virtual IP – mirror handles appropriate redirection to current Primary • ECP protocol is “mirror aware” – app servers will connect directly to current primary, and will fail over to new primary as appropriate. ECP will perform recovery on reconnection.
Wrap-up Questions & Discussion