590 likes | 708 Views
ASE136: How to build highly available applications using OpenSwitch?. Ganesan Gopal Senior Manager ganesan.gopal@sybase.com August 15-19, 2004. The Enterprise. Unwired. The Enterprise. Unwired. Industry and Cross Platform Solutions. Manage Information. Unwire Information. Unwire
E N D
ASE136: How to build highly available applications using OpenSwitch? Ganesan Gopal Senior Manager ganesan.gopal@sybase.com August 15-19, 2004
The Enterprise. Unwired. Industry and Cross Platform Solutions Manage Information Unwire Information Unwire People • Adaptive Server Enterprise • Adaptive Server Anywhere • Sybase IQ • Dynamic Archive • Dynamic ODS • Replication Server • OpenSwitch • Mirror Activator • PowerDesigner • Connectivity Options • EAServer • Industry Warehouse Studio • Unwired Accelerator • Unwired Orchestrator • Unwired Toolkit • Enterprise Portal • Real Time Data Services • SQL Anywhere Studio • M-Business Anywhere • Pylon Family (Mobile Email) • Mobile Sales • XcelleNet Frontline Solutions • PocketBuilder • PowerBuilder Family • AvantGo Sybase Workspace
Agenda • What is Openswitch? • Key OpenSwitch functionalities • OpenSwitch deployment options • New features in OpenSwitch 12.5 and 12.5.1 • What is Replication Coordination Module (RCM)? • Q & A
What is OpenSwitch? Open Server Gateway • Passthru gateway built on top of Sybase Open Server libraries. • Provides functionality to manually or automatically manage client connections. • Can manage any number of remote servers • Routing and load balancing capabilities • Includes error detection and recovery features • Integrates into any HA environment • Works with existing HA solution (such as Replication Server, HA-CMP, etc.). • Makes HA transparent
OpenSwitch Features Connection Failover • Transparent Connection Management • Failure detection and recovery • HA coordination Load Balancing & Routing • Server pooling and routing • Server Chaining and balancing • Connection caching • Connection suspend and resume • Resource governing Management • Central management of all client connections • Notification Events • Dynamic configuration
Transparent Connection Management Jaguar CTS ASE Server A ISQL OpenSwitch PowerBuilder ASE Server B Any Open Client application and platform
Connection Management (cont.) Jaguar CTS ASE Server A ISQL OpenSwitch PowerBuilder ASE Server B RPC Switch Request Any Open Client application and platform Administrator (isql)
Transparent Connection Management Benefits • Allows maintenance periods • Index rebuilds • DBCC checks • Non-intrusive data loads • Server reboot
Failure Detection and Recovery • Status of outgoing OpenSwitch connections are monitored. • As each connection is lost to primary server OpenSwitch moves it to next available Server. • Switch decision is on a per-connection basis. • Switch attempts to recover: • Database context • Client language • Client charset • Client-side cursors ASE Server A Jaguar CTS OpenSwitch ISQL PowerBuilder ASE Server B Open Client App
ASE Server A ASE Server B Receiving Results Idle Idle Transaction Management • OpenSwitch Tracks: • Client activity • Database context • Client-side cursors • “Deadlock” Issued To Clients With • Open transactions • Active communications • Open client-side cursor DEADLOCK! DEADLOCK! In Transaction Jaguar CTS ISQL PowerBuilder ISQL
ASE Server A ASE Server B Receiving Results Idle Idle Transaction Management (cont). • Failover is Transparent to clients • That are not in a transaction • Clients that are idle • Administrative Switch Requests • Behavior is configurable • Can queue switch until client transactions are committed. • Can specify a maximum number of seconds for client to commit transaction. • Can force transactions to be broken (generating a “deadlock” message). DEADLOCK! DEADLOCK! In Transaction Jaguar CTS ISQL PowerBuilder ISQL
Pooling and Routing • What is a Pool? • Defines a named group of servers • Servers within a pool are a self-contained fail-over group • Connections go to first server in pool • Fail over to next available server in pool • Kicked off of OpenSwitch if no servers are remaining • Connections can be routed to pools based upon • Username • Application Name • Client hostname • Can use “wildcard” names • Pools can be configured in chained or balanced mode Pool A Pool B Pool C
Server A Server C Server B DB A DB A/DB B DB B App A App B App B Reporting Pooling and Routing Example POOL A POOL C POOL B App A Reporting App B Pool Name POOL A POOL B POOL C Who App A App B Report Primary Server A Server B Server C Failover Server C Server C -- App A
Pool A Pool B Pool A Pool B Chaining and Balancing Chained Pool A Balanced Pool B
Connection Caching • Caching Per Pool • Each pool may specify a caching period • Out-going connections are retained by OpenSwitch for the specified period. • Connection handed back to user next time (s)he connects (with same password) • Connection closed if caching period expires. • Performance Boost! • Web Servers (CGI’s) without persistent connections. • Site Handlers • Any application that performs a rapid connect/query/release. Disconnected Client Connection Cache Reconnect Connection Restore
ASE Server A HA Coordination • Without Coordination • OpenSwitch makes all decisions about routing and fail-over on its own. • Decision made per-connection • Coordination Module • Custom developed OpenSwitch client, using Coordination Module API for complex administration and Sybase provided RCM to control a high availability, warm standby replication environment • When CM is present OpenSwitch defers major decision to it. • Events raised by clients (such as logins and failures) are forwarded to CM. • Events include indication of what OpenSwitch would like to do. • Client hangs until CM responds. What do I do? CM Response Action Application
ASE Server A ASE Server B HA Coordination (cont.) HA Solution (Rep Server) • Custom Coordination Module: • Developed by/for each customer to address questions: • What constitutes a failure? • What should be response to failure? • Used to coordinate with HA solution (Rep Server, HP Service Guard, EMC SRDF, etc.) • Hook-point for client-specific logic, such as application security. • Can be used to override OpenSwitch decisions on routing and pooling. • Can perform any action OpenSwitch administrator may perform. In Sync? Really Fail? CM Switch Connection Lost! Application
Connection Suspend/Resume Suspend Activity • Suspend/Resume • One or more connections may be temporarily suspended. • Can wait for transactions to complete or “deadlock” them. • Allows quick maintenance periods. • Can safely bounce SQL Server. • Can safely establish and break hardware or OS mirroring. Reporting Enable Mirror (TimeFinder) Reporting Resume Activity Reporting Break Mirror Reporting
Dynamic Configuration • Configuration in external .cfg file. • Server can change dynamically from altered .cfg file, or via built-in registered procedures. • No need to restart OpenSwitch. • Coordination Module can alter configuration.
OpenSwitch Deployment Optimal Configuration? • It depends on the HA requirement • Run multiple copies OpenSwitch to mitigate openswitch being a single point of failure. • Need to determine: • What are recovery requirements? • How much of the environment can be affected by a failure? • How much hardware is available? • What is most likely to fail?
Reducing Points of Failure Server A Server B • Increasing OpenSwitch instances (reducing #clients per switch) reduces impact of failure. • Client fail-over utilizes existing Open Client technology
ASE Server B Open Switch with ASE • ASE requires data replication to keep the Primary and Replicate databases in sync. • OpenSwitch switches between the two ASE Servers, Server A and Server B. • If either ASE Server becomes unavailable, Open Switch shields the client from the disconnect. • CM requires Ping User connection to each ASE Server to ping the server to determine availability • CM requires connection / code to communicate to Rep Server ASE Server A HA Solution (Rep Server) CM
What is NEW in OSW 12.5? • Replication Server Co-ordination Module for Warm-Standby Configuration (out of the box) • Installation & install time Configuration • Dynamic SQL Support • OCS 12.5 integration - Optimal Resource Usage • HA-Aware Support • Quality
OpenSwitch What is RCM? • RCM is a OpenSwitch Coordination Module (CM) • Registers callback routines for events within the OpenSwitch • Sets timer events and registers timer callbacks • Connects to the OpenSwitch • Executes a “run” command. At this point nothing happens until a defined event occurs • OpenSwitch contacts the coordination module when: • A user requests a connection • A login attempt fails • An existing connection fails • The OpenSwitch connection fails • Coordination module determines which server the user should connect to • Coordination module is an Open-Client C executable • An OpenSwitch can have more than one coordination module • A coordination module can connect to more than one OpenSwitch Coordination Module
What is RCM? • RCM provides all the functionality needed to control a high availability, warm standby replication environment • Prior to RCM, customers and/or professional services had to write custom coordination modules • RCM is part of the OpenSwitch product
RCM Warm-Standby Replication Environment Topology Active ASE Application End Users OpenSwitch Warm-Standby Replication Server OpenSwitch Decision Support Users Standby ASE
RCM Overview • The RCM provides the following features: • Coordinates user access to the active and standby ASE servers • Switches the warm standby connections in the Replication Server • Switches users from the active to the standby ASE if the active server is unavailable • Supports failover of multiple databases in the active ASE • Coordinates two OpenSwitch servers to provide redundancy • Note: redundant OpenSwitch servers are optional
Configuring the OpenSwitch and RCM • RCM is configured using a separate configuration file • RCM settings must be coordinated with OpenSwitch settings • OpenSwitch configuration parameters • The following parameters appear in both the OpenSwitch and the RCM configuration files: • SERVER_NAME • COORD_USER • COORD_PASSWORD • The parameter COORD_MODE must be set to ‘ALWAYS’ • Example: • [CONFIG] • SERVER_NAME = ws_os • COORD_USER = os_coord • COORD_PASSWORD = os_coord_pwd • COORD_MODE = ALWAYS
Configuring the OpenSwitch and RCM • Applications end users must connect to the active server unless it is unavailable • OpenSwitch must be configured with exactly one POOL for application end users • MODE = ‘CHAINED’ • STATUS = ‘UP’ • List the active server followed by the standby server • Make sure all application end users connect using this POOL • Example: • [POOL=Application:MODE=CHAINED, STATUS=UP] • servers: • BookServer • StandbyBook • connections: • username:bob • username:fred
Configuring the OpenSwitch and RCM • Optionally OpenSwitch can be configured with pools for decision support users • Zero or more pools • Mode is either ‘CHAINED’ or ‘BALANCED’ • STATUS = ‘UP’ • Example: • [POOL=DSS:MODE=CHAINED, STATUS=UP] • servers: • StandbyBook • BookServer • connections: • username:alice
Configuring the OpenSwitch and RCM • RCM configuration parameters • Three replication failover modes: SWITCH, QUIESCE, NONE • Multiple database support • Timing parameters • Required database list
Configuring the OpenSwitch and RCM • Standard RCM configuration parameters # Open Switch Server; These parameters match OpenSwitch parameters OPENSWITCH = ws_os COORD_USER = os_coord COORD_PASSWORD = os_coord_pwd APP_POOL = Application # Active and Standby ASE parameters ACTIVE_ASE = BookServer STANDBY_ASE = StandbyBook ASE_USER = sa #ASE_PASSWORD - ASE password is blank # Wait 5 minutes before starting the failover FAILOVER_WAIT = 300 # Wait 2 minutes for the Rep Server to perform the switch over MONITOR_WAIT = 120 # Wait 5 seconds between ping/monitor commands TIMER_INTERVAL = 5
Configuring the OpenSwitch and RCM • RCM configuration parameters - Failover Mode = ‘SWITCH’ # On failover, switch the flow of replication RS_FAILOVER_MODE = SWITCH # Replication Server REP_SERVER = ws_rs RS_USER = sa #RS_PASSWORD - Replication Server password is blank # Identify the database connection in the warm-standby environment LOGICAL_CONN = LDS.LDB DATABASES = pubs3
Configuring the OpenSwitch and RCM • RCM configuration parameters - Failover Mode = ‘QUIESCE’ # On failover, quiesce the Replication Server # No database information is needed RS_FAILOVER_MODE = QUIESCE # Replication Server REP_SERVER = ws_rs RS_USER = sa #RS_PASSWORD - Replication Server password is blank
Configuring the OpenSwitch and RCM • RCM configuration parameters - Multiple database support # On failover, switch the flow of replication RS_FAILOVER_MODE = SWITCH # Replication Server REP_SERVER = ws_rs RS_USER = sa #RS_PASSWORD - Replication Server password is blank # Identify the databases in the warm-standby environment LOGICAL_CONN = LDS.pubs3, LDS.sales, LDS.signings #DATABASES - Omitted, so RCM will use pubs3, sales, signings # The loss of the signings database will not trigger a failover REQUIRED_DBS = pubs3, sales
Starting the RCM • Executable and startup script is in the OpenSwitch bin • Start the OpenSwitch before starting the RCM • Starting RCM as a Windows service is not supported • RCM command line syntax rcm [-v] [-h] [-a] [-R] [-c config_file] [-e system_log] [-i interfaces_file_directory] [-T trace_flags] • Display the RCM version string rcm -v • Start the RCM rcm -c ../config/rcm.cfg -I ../../interfaces -T EF
RCM Failover Handling • Identifying the ASE failure • The OpenSwitch notifies the RCM when: • A user fails to connect to an ASE • An existing connection to an ASE fails • A switch over to an ASE fails • Active ASE: • If an application end users fails, the RCM starts the failover process • If a decision support user fails, the RCM switches them to the next available server • Standby ASE: • If any user fails, the RCM switches them to the next available server • Application users cannot log into the standby ASE unless the failover process has already occurred
RCM Failover Handling • RCM Failover Processing • Starts only when an application end user fails on the active ASE • Ping the active ASE • Attempt to log into the ASE • If successful, the ASE is not down, abort the failover process • Suspend connections to the active ASE • Stop new users from logging into the ASE (rp_stop) • Suspend existing connections (rp_server_status, LOCKED) • Wait for Recovery • Wait for the ASE to automatically recover, or for the network to stabilize • Wait a configurable amount of time (FAILOVER_WAIT) • Ping the ASE at a configurable interval (TIMER_INTERVAL) • If successful, abort the failover process
RCM Failover Handling • Issue the Replication Server failover commands • SWITCH – (switch active) • QUIESCE – (suspend log transfer from all, admin quiesce_force_rsi) • NONE - do not failover the connections in the Replication Server • Monitor the Replication Server failover • Wait for the Replication Server to finish the failover commands • Wait a configurable amount of time (MONITOR_WAIT) • Monitor the Replication Server at a configurable interval (TIMER_INTERVAL) • SWITCH – (admin logical_status) • QUIESCE – (admin health)
RCM Failover Handling • Start the Replication Agent on the Standby ASE • Only if failover mode is ‘SWITCH’ (sp_start_rep_agent) • Switch the users to the Standby ASE • Set the server to ‘DOWN’ in the OpenSwitch (rp_server_status, ‘DOWN’) • Switch the connections from the active to standby (rp_switch) • Restart the existing connections (rp_start)
RCM Failover Handling • User permissions for the RCM • ASE User • Must be a valid login for both the active and the standby ASE • Permission to connect to all databases that participate in replication • Permission to start the replication agent on all databases (sp_start_rep_agent) • Replication Server • User must have ‘sa’ privileges
Redundant OpenSwitch using RCM • RCM supports dual OpenSwitch environments • Two OpenSwitch servers and two RCM’s • Primary OpenSwitch coordinates application end user connections to the active ASE • Primary OpenSwitch coordinates the failover of the Replication Server and the application end users to the standby ASE • All application end users must connect to the active ASE through the primary OpenSwitch • Secondary OpenSwitch is on standby and takes control of the failover processing if the primary OpenSwitch fails • Secondary OpenSwitch provides decision support users access to the standby ASE • Secondary RCM does not allow application end users to connect unless the primary OpenSwitch has failed • Connectivity provides multiple servers for end users. When the primary OpenSwitch fails, users connect to the secondary OpenSwitch (multiple entries in the interfaces file)
Redundant OpenSwitch using RCM • RCM establishes a connection to both the primary and the secondary OpenSwitch servers • RCM is notified when either OpenSwitch fails • Primary OpenSwitch fails: • Primary RCM terminates • Application end users reconnect to the secondary OpenSwitch • Secondary RCM coordinates the connections to the active and standby ASE servers and controls failover processing • Secondary OpenSwitch fails: • Secondary RCM terminates • Users reconnect to the primary OpenSwitch Primary OpenSwitch RCM Primary RCM Secondary RCM RCM Secondary OpenSwitch
RCM Notification Process • The RCM provides a simple notification feature that automatically perform a user-defined process when certain events occur • Executes a process (e.g. a script or a program) defined by the NOTIFICATION_PROCESS configuration parameter • The process is executed from the RCM’s current working directory • The process is executed with the same set of permissions that the RCM was executed with • Output for the process is redirected to a temporary file. The full path name of this file is written to the RCM log and starts with the prefix “rcm.” • The RCM does not delete the temporary output file
RCM Notification Process • Events that trigger the notification process: • The RCM detected a possible failover situation where the active ASE is not responding • The failover process has started • The failover has been aborted because the active ASE has recovered • The RCM cannot connect to the Replication Server • The RCM was unable to start the Replication Agents • Executing the failover process in the Replication Server failed • Switching the users in the OpenSwitch from the active ASE to the standby ASE failed • The RCM has exited. Under normal conditions the RCM should not exit • One of the OpenSwitch servers failed • Test notification. Executed when the user starts the RCM with the –a (analyze) option • A notification ID and a text message are passed to the notification process. The process can interpret the ID to determine the appropriate response
Dynamic SQL Support • OpenSwitch 12.5 supports the following Dynamic SQL Statements: • CS_PREPARE • CS_EXECUTE_IMMEDIATE • CS_EXECUTE • CS_DESCRIBE_INPUT • CS_DESCRIBE_OUTPUT • CS_DEALLOC
ASE Server B Open Switch with ASE 12.x with pre-12 client • ASE 12.x cluster shares a single copy of data, eliminating the need for data replication between ASE Server A and ASE Server B. • Open Switch switches between the two ASE Servers, Server A and Server B. • If either ASE Server becomes unavailable, Open Switch shields the client from the disconnect. • CM still requires a Ping User connection to each ASE Server to ping the server to determine availability. ASE Server A CM
ASE Server B ASE Server B ASE Server A ASE Server A Open Switch with ASE 12.x w/12.x client (HA Aware) SITE ONE SITE TWO • Open Switch still fits into a true ASE 12.x clustered environment where the client has been written or re-written to take advantage of the companion server. • The Open Switch can switch between two or more ASE 12.x clusters. • In ASE 12.x the cluster can only consist of 2 servers, but Open Switch can switch between N servers, in this case N=4. CM
What is new in OSW 12.5.1? • Connection caching • Encrypted Username/Password in configuration file • Multi-threaded CM • Widen platform appeal by addition of Linux platform • Cleaner error messages and improved user documentation • Moved OpenSwitch to use robust Installshield based installer • Extensive stress and functional testing to exercise OpenSwitch