160 likes | 176 Views
A.R.T. is a system for hot standby that ensures continuous operation for critical applications needing socket connections. Designed to minimize downtime and loss of productivity, it offers a seamless redundancy solution with V.R.R.P. implementation.
E N D
Application Redundancy Tool A.R.T. CS 495 Fall 2005 Kristi Olson
Description • A.R.T. is an system of hot standby for applications. • Designed for applications which need to run continuously. • Intended for use with applications which require socket connections.
Internship • Internship with GCI’s Network Support Group, Operations System Support. • NSG OSS responsibilities: • Provision phone service, calling cards, internet services. • Internal Support (data collection) • Network Monitoring
Applications supporting these services • Homegrown. • Most run around the clock. • Some applications are mission critical. • Loss of productivity when applications are down. • No formal system of redundancy. • Lack of existing product to buy. • Lack of funds.
Applications continued • Methods of transmission vary widely. • Fiber optic • Reliable, fast • Satellite • Prone to weather related outages, inherent 600 ms delay. • Microwave • Even more prone to weather: fog, rain, or even a hot day causes outages.
Applications continued • Socket connections: • Some applications establish one or more “permanent” socket connections. • Others repeatedly establish multiple “temporary” socket connections.
How to provide redundancy? • V.R.R.P. • Virtual Routing Redundancy Protocol • System of dynamic redundancy. • One router is designated master. • Other routers are backups. • Uses multicasting.
V.R.R.P. and A.R.T. • Establish “Master” and “backup” instances of the application. • Identical except for a configuration file. • The backups loop continuously listening for status polls from the Master. • If the Master stops sending polls, the backup comes online. • Put the application instances on the same multicast group.
Requirements • Minimal modifications to existing code. • Configurable for any type of application and transmission protocol. • Reliability: • A backup can not come online prematurely, nor can it come online too late. • The actual switch over should be minimally service affecting. • Applications should take control or release sockets accordingly.
Configuration • Priority: • The instance with the highest priority becomes the master. • Broadcast Interval: • How often the application sends and listens for status polls. • Allowable Missed Polls: • Multicasting uses UDP. • Less reliable transmission technologies.
Status Polls • Application Name • Each application only reads messages pertaining to itself. • Timestamp: • Latency is possible. • A.R.T. ignores old messages. • Priority
Putting it all together • Configuration file: • Contains instance information. • A.R.T. Perl module: • Defines multicast groups. • Evaluates status polls. • Other related overhead • Modifications to existing code: • Where to poll. • Don’t want to poll too often, or not often enough. • Encapsulate socket connections. • Re-factoring opportunity
A.R.T. Monitoring • Email Alerts: • Notify admins should a switch over occur. • Web Page: • Traffic Light: • Green Light - Master is working. • Yellow Light – Backup is listening. • Red Light - application is down.
A.R.T. • Questions?