ABC Co. Network Implementation

ABC Co. Network Implementation • High reliability is primary concern • near 100% uptime required • Customer SLA has stiff penalty clauses • Everything is designed in a redundant fashion • Network redundancy not integrated with system design or application design. • Application and system design not integrated • Management added last (to fix problems)

The challenge is always politics • Politics prevents different parts of the company from working together. • Networking, Systems, and Applications are three different groups. • Systems group own the management issues. • Some requirements get in the way: • e.g. Management station must keep its data on the database server.

Network design • “Dual Everything” is the design rule • Dual Routers/hubs (Cisco 5500’s) • Dual Ethernet • Dual attached systems

A simple picture Redundant net to customers Rtr/Hub Rtr/Hub Dual rail Ethernet Server a Server n TNG DNS Wins

More detail • No actual “Ethernet bus” • Systems connect to 5500 via UTP • Each system connects to both 5500’s • one connection is to “primary” LAN, other to secondary LAN • Half have “left” 5500 as primary, other have “right” as primary. • 5500s run OSPF and “router cluster” software

Problems... • Server OS (NT and Unix) do not switch off the primary interface if it fails and will keep trying to use it. Applications hang and connections time out. • DNS points only to one interface on each server. • No automatic failover built into applications.

Management software must: • Detect NIC failures • Continue to monitor system agents in presence of network failures • Correct server routing tables if primary interface fails (or the hub fails) • Update DNS • Notify operations as required.

Challenges • Get each system to report all status via both NICs. • Monitor system over both NICs. • Prevent duplicate notifications. • Fail over as fast as possible. • Show connectivity of each system to both networks.

What needs to be done to do this? • Modify auto discovery scripts to add each system twice as independent systems. • Requires private host file for name/address translation (cannot depend on access to DNS) • Invent system to recognize which interface is “active” and block those from other Nic(s)

More work... • Duplicate any information in Object Repository that is needed to manage failover onto local system (cannot trust access to SQL server) • Store current connectivity state for all servers (added ILPs to class definitions).

Tricks used • Each system name in messages has code added to end to indicate interface address: (-p or -s) • Most of the work is done in event message processing. • Each “raw” message is suppressed and a script evoked to process it. • Ping success/failures used to switch state • Agent messages dropped base on state and p/s flag

Basic set of flows • For each event, (other than pings) • If mode is P or S (kept in NT Registry), and message is from S or P, discard. • Else, reformat message with real server name, improve content (system class, etc.) and send back to event console as a new message

More Flow • For each Ping Success/Fail reported: • Remember DSM has already done the retries • If failure, check to see if other port fails, too. If the other port is dead, too, then declare the node down, and reset state to primary. • If its primary, the do failover to secondary. If secondary, do a “failure” back to primary. • Update DNS in all cases.

Router / Hub failure • If the router/hub fails, invoke the primary failover script for each node connected to the primary side, and the secondary failover script for each node connected to the secondary side. • This is effectively all the nodes, so we don’t have to wait for each to have a ping failure. The system will stabilize faster.

Does it work? • You bet! It required: • Some special REXX scripts for failover • A few Basic programs • A hack to the auto discovery scripts. • Some magic with Trix and a few more basic programs.

ABC Co. Network Implementation

ABC Co. Network Implementation

Presentation Transcript

ABC Implementation within the Bureau of Reclamation

Architecture of Network Implementation

Company ABC Implementation of 005010 and ICD-10

Financial Underwriting ABC Co. Case Study

Network Implementation

ABC Computer Education Co.

ABC BUBBLE GUM CO. Balance Sheet Today's Date

Implementation Network Webinar

Network Implementation

ABC: an implementation of AspectJ

Implementation Co-ordinators Network Meeting 14 July

ABC Block Copolymer-Nanoparticle Co-Assemblies

Process-Focused Network Implementation

ABC Implementation Status

Maryland’s Co-Teaching Network

Twitter and ABC Network ’ s “ GCB ”

Co-operative Development Network

Network Implementation (part II)

ABC Television Network

Network Design and Implementation

Co-Teaching Network Cohort Strategies for Enhancing Co-Teaching Implementation – April 29, 2010

ABC Sporting Goods Network Design Proposal