Linux-HA Release 2 – An Overview

Alan Robertson Project Leader – Linux-HA project alanr@unix.sh (a.k.a. alanr@us.ibm.com) IBM Linux Technology Center Linux-HA Release 2 – An Overview

Agenda • High-Availability (HA) Clustering? • What is the Linux-HA project? • Linux-HA applications and customers • Linux-HA release 1 / Release 2 /Feature Comparison • Release 2 Details • Request for Feedback • DRBD – an important component • Thoughts about cluster security

What Can HA Clustering Do For You? • It cannot achieve 100% availability– nothing can. • HA Clustering designed to recover from single faults • It can make your outages very short • From about a second to a few minutes • It is like a Magician's (Illusionist's) trick: • When it goes well, the hand is faster than the eye • When it goes not-so-well, it can be reasonably visible • A good HA clustering system adds a “9” to your base availability • 99->99.9, 99.9->99.99, 99.99->99.999, etc.

The Desire for HA systems Who wants low-availability systems? • Why are so few systems High-Availability?

Why isn't everything HA? • Cost • Complexity

Complexity Complexity is the Enemy of Reliability

Commodity HA? • Installations with more than 200 Linux-HA pairs: • Autostrada – Italy • Italian Bingo Authority • Oxfordshire School System • Many retailers (through IRES and others): • Karstadt's • Circuit City • etc. • Also a component in commercial routers, firewalls, security hardware

The HA Continuum Single node HA system (monitoring w/o redundancy) • Provides for application monitoring and restart • Easy, near-zero-cost entry point – HA system starts init scripts instead of /etc/init.d/rc (or equivalent) • Addresses Solaris / Linux functional gap Multiple Virtual Machines – Single Physical machine • Adds OS crash protection, rolling upgrades of OS and application – good for security fixes, etc. • Many possibilities for interactions with virtual machines exist Multiple Physical Machines (“normal” cluster) • Adds protection against hardware failures Split-Site (“stretch”) Clusters • Adds protection against site-wide failures (power, air-conditioning, flood, fire)

How Does HA work? Manage redundancy to improve service availability • Like a cluster-wide-super-init with monitoring • Even complex services are now “respawn” • on node (computer) death • on “impairment” of nodes • on loss of connectivity • for services that aren't working (not necessarily stopped) • managing potentially complex dependency relationships

Redundant Data Access • Replicated • Copies of data are kept updated on more than one computer in the cluster • Shared • Typically Fiber Channel Disk (SAN) • Sometimes shared SCSI • Back-end Storage (“Somebody Else's Problem”) • NFS, SMB • Back-end database • All are supported by Linux-HA

The Linux-HA Project • Linux-HA is the oldest high-availability project for Linux, with the largest associated community • Linux-HA is the OSS portion of IBM's HA strategy for Linux • Linux-HA is the best-tested Open Source HA product • The Linux-HA package is called “Heartbeat”(though it does much more than heartbeat) • Linux-HA has been in production since 1999, and is currently in use on more than ten thousand sites • Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others • Linux-HA shipped with every major Linux distribution except one. • Release 2 shipped end of July – more than 6000 downloads since then

Linux-HA Release 1 Applications • Database Servers (DB2, Oracle, MySQL, others) • Load Balancers • Web Servers • Custom Applications • Firewalls • Retail Point of Sale Solutions • Authentication • File Servers • Proxy Servers • Medical Imaging Almost any type server application you can think of – except SAP

Linux-HA customers • FedEx – Truck Location Tracking • BBC – Internet infrastructure • Oxfordshire Schools – Universal servers – an HA pair in every school • The Weather Channel (weather.com) • Sony (manufacturing) • ISO New England manages power grid using 25 Linux-HA clusters • MAN Nutzfahrzeuge AG – truck manufacturing division of Man AG • Karstadt, Circuit City use Linux-HA and databases each in several hundred stores • Citysavings Bank in Munich (infrastructure) • Bavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City • Emageon – medical imaging services • Incredimail bases their mail service on Linux-HA on IBM hardware • University of Toledo (US)– 20k student Computer Aided Instruction system

Linux-HA Release 1 capabilities • Supports 2-node clusters • Can use serial, UDP bcast, mcast, ucast communication • Fails over on node failure • Fails over on loss of IP connectivity • Capability for failing over on loss of SAN connectivity • Limited command line administrative tools to fail over, query current status, etc. • Active/Active or Active/Passive • Simple resource group dependency model • Requires external tool for resource (service) monitoring • SNMP monitoring

Linux-HA Release 2 capabilities • Built-in resource monitoring • Support for the OCF resource standard • Much larger clusters supported (>= 8 nodes) • Sophisticated dependency model • Rich constraint support (resources, groups, incarnations, master/slave) • XML-based resource configuration • Coming in 2.0.x (later in 2005) • Configuration and monitoring GUI • Support for GFS cluster filesystem • Multi-state (master/slave) resource support • Monitoring of arbitrary external entities (temp, SAN, network)

Linux-HA Release 1 Architecture

Linux-HA Release 2 Architecture(add TE and PE)

Linux-HA Release 2 Architecture(more detail)

Resource Objects in Release 2 • Release 2 supports “resource objects” which can be any of the following: • Primitive Resources • Resource Groups • Resource Clones – “n” resource objects • Multi-state (master/slave) resources

Classes of Resource Agents in R2(resource primitives) • OCF – Open Cluster Framework - http://opencf.org/ • take parameters as name/value pairs through the environment • Can be monitored well by R2 • Heartbeat – R1-style heartbeat resources • Take parameters as command line arguments • Can be monitored by status action • LSB – Standard LSB Init scripts • Take no parameters • Can be monitored by status action • Stonith – Node Reset Capability • Very similar to OCF resources

An OCF primitive object <primitive id=”WebIP” class=”ocf” type=”IPaddr” provider=”heartbeat”> <instance_attributes> <attributes> <nvpairname=”ip”value=”192.168.224.5”/> </attributes> </instance_attributes></primitive> Attribute nvpairs are translated into environment variables

An LSB primitive resource object(i. e., an init script) <primitive id=”samba-smb-rsc” class=”lsb” type=”smb”> <instance_attributes> <attributes/> </instance_attributes></primitive>

A STONITH primitive resource <primitive id=”st” class=”stonith” type=”ibmhmc” provider=”heartbeat”> <instance_attributes> <attributes> <nvpairname=”ip” value=”192.168.224.99” /> </attributes> </instance_attributes></primitive>

Resource Groups Resource Groups provide a shorthand for creating ordering and co-location dependencies • Each resource object in the group is declared to have linear start-after ordering relationships • Each resource object in the group is declared to have co-location dependencies on each other • This is an easy way of converting release 1 resource groups to release 2 <group id=”webserver”> <primitive/> <primitive/></group>

Resource Clones • Resource Clones allow one to have a resource object which runs multiple (“n”) times on the cluster • This is useful for managing • load balancing clusters where you want “n” of them to be slave servers • Cluster filesystem mount points • Cluster Alias IP addresses • Cloned resource object can be a primitive or a group

Multi-State (master/slave) Resources(coming in 2.0.3) • Normal resources can be in one of two stable states: • running • stopped • Multi-state resources can have more than two stable states. For example: • running-as-master • running-as-slave • stopped • This is ideal for modeling replication resources like DRBD

Basic Dependencies in Release 2 • Ordering Dependencies • start before (normally implies stop after) • start after (normally implies stop before) • Mandatory Co-location Dependencies • must be co-located with • cannot be co-located with

Resource Location Constraints • Mandatory Constraints: • Resource Objects can be constrained to run on any selected subset of nodes. Default depends on setting of symmetric_cluster. • Preferential Constraints: • Resource Objects can also be preferentially constrained to run on specified nodes by providing weightings for arbitrary logical conditions • The resource object is run on the node which has the highest weight (score)

Advanced Constraints • Nodes can have arbitrary attributes associated with them in name=value form • Attributes have types: int, string, version • Constraint expressions can use these attributes as well as node names, etc in largely arbitrary ways • Operators: • =, !=,<, >, <=, >= • defined(attrname), undefined(attrname), • colocated(resource id), notcolocated(resource id)

Advanced Constraints (cont'd) • Each constraint is associated with particular resource, and is evaluated in the context of a particular node. • A given constraint has a boolean predicate associated with it according to the expressions before, and is associated with a weight, and condition. Weights can be constants – or attribute values. • If the predicate is true, then the condition is used to compute the weight associated with locating the given resource on the given node. • Conditions are given weights, positive or negative. Additionally there are special values for modeling must-have conditions • +INFINITY • -INFINITY • The total score is the sum of all the applicable constraint weights

Sample Dynamic Attribute Use • Attributes are arbitrary – only given meaning by rules • You can assign them values from external programs • For example: • Create a rule which uses the attribute fc_status as its weight for some resource needing a Fiber Channel connection • Write a script to set the status of fc_status for a node to 0 if the FC connection is working, and -10000 if it is not • Now, those resources automatically move to a place where the FC connection is working – if there is such a place, if not they stay where they are.

rsc_location information • We prefer the webserver group to run on host node01 <rsc_location id=”run_Webserver” group=”webserver”> <rule id=”rule_webserver” score=100> <expression attribute=”#uname” operation=”eq” value=”node01”/> </rule></rsc_location>

Request for Feedback • Linux-HA Release 2 is a good solid HA product • At this point human and experience factors will likely more helpful than most technical doo-dads and refinements • This audience knows more about that than probably any other similar audience in the world • So,check out Linux-HA release 2 and tell us... • What we got right • What needs improvement • What we got wrong • We are very responsive to comments • We look forward to your critiques, brickbats, and other comments

DRBD – RAID1 over the LAN • DRBD is a block-level replication technology • Every time a block is written on the master side, it is copied over the LAN and written on the slave side • Typically, a dedicated replication link is used • It is extremely cost-effective – common with xSeries • Worst-case around 10% throughput loss • Recent versions have very fast “full” resync

Security Considerations • Cluster: A computer whose backplane is the Internet • If this isn't scary, you don't understand... • You may think you have a secure cluster network • You're probably mistaken now • You will be in the future

Secure Networks are Difficult Because... • Security is not often well-understood by admins • Security is well-understood by “black hats” • Network security is easy to breach accidentally • Users bypass it • Hardware installers don't fully understand it • Most security breaches come from “trusted” staff • Staff turnover is often a big issue • Virus/Worm/P2P technologies will create new holes especially for Windows machines

Security Advice • Good HA software should be designed to assume insecure networks • Not all HA software assumes insecure networks • Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication • Crossover cables are reasonably secure – all else is suspect ;-)

References • http://linux-ha.org/ • http://linux-ha.org/Talks (these slides) • http://linux-ha.org/download/ • http://linux-ha.org/SuccessStories • http://linux-ha.org/Certifications • http://linux-ha.org/BasicArchitecture • http://linux-ha.org/NewHeartbeatDesign • www.linux-mag.com/2003-11/availability_01.html

Legal Statements • IBM is a trademark of International Business Machines Corporation. • Linux is a registered trademark of Linus Torvalds. • Other company, product, and service names may be trademarks or service marks of others. • This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.

Linux-HA Release 2 – An Overview