Reliable Server Pooling – A Novel IETF Architecture for Availability-Sensitive Services

Reliable Server Pooling–A Novel IETF Architecture for Availability-Sensitive Services

Table of Contents • What is Reliable Server Pooling? • Prototype Demonstration • Terminology and Protocols • Motivation and Application Scenarios • Failure Detection • Dynamic Pools • “Unclean” Shutdowns • Session Monitoring • Failover Mechanism • Applying Client-Based State Sharing • Conclusion and Outlook Thomas Dreibholz's Reliable Server Pooling Page http://tdrwww.iem.uni-due.de/dreibholz/rserpool/

What is „Reliable Server Pooling“?Prototype Demonstration

Reliable Server Pooling (RSerPool) • Terminology: • Pool Element (PE): Server • Pool: Set of PEs • PE ID: ID of a PE in a pool • Pool Handle: Unique pool ID • Handlespace: Set of pools • Pool Registrar (PR) • Pool User (PU): Client • Support for Existing Applications • Proxy Pool User (PPU) • Proxy Pool Element (PPE) • Protocols: • ASAP (Aggregate Server Access Protocol) • ENRP (Endpoint Handlespace Redundancy Protocol)

Session Failover usingClient-Based State Sharing • Necessary to handle failover: A new PE must be able to recover the session state of the old PE • Simple solution for many applications: Usage of „state cookies“ [LCN2002] Now part of the ASAP protocol!

Server Selection Rules(Pool Policies) • What is a Pool Policy? • A rule for the selection of the PEs • Defined in our IETF Working Group draft (draft-ietf-rserpool-policies-07.txt) • Application of Policies • Registrar: Creates PE list upon request by PU • Pool User: Selection of a PE from the list • Both according to the pool policies (pool-specific!) • Non-Adaptive Policies • Stateless: Random (RAND) • Stateful: Round Robin (RR) (Default policy, must be supported) • Adaptive Policy • Least Used (LU) • Load definition is application-specific! • Round robin among multiple least-loaded PEs

The Application Model • Server • PE Capacity • Shared among sessions (multi-tasking principle) • Client • Requests are generated • Request Size (effort) • Request Interval (frequency) • Waiting queue for requests • Sequential processing • System Utilization • PU:PE Ratio • Provisioning for certain Target Utilization, e.g. 80%

Performance Metrics • Provider's Perspective “Does my server capacity gain revenue?” Average Utilization of server resources [%] • User's Perspective “How much time is needed to process my requests?” • Avg. Handling Speed [% of average server capacity] • Depends on: • Queuing • Startup • Server • Failover

Dynamic Pools – A Proof of Conept • Ideal case: a “clean” shutdown • PEs abort their session before shutting down • Not critical ... • ... except for extremely low MTBF • Round Robin: • no stable rounds -> random behaviour Handling Speed

Utilization Handling Speed “Unclean” Shutdowns • Re-processing effort increases (due to lost work) • Session monitoring is crucial: fast failure detection -> quick failover

Handling Speed Session Monitoring • Session monitoring is crucial • Various possible mechanisms • Keep-Alives • Part of application protocol • e.g. transaction timeouts • Endpoint Keep-Alive Monitoring • Here: small impact • When is it useful? • Short and frequent requests • Minimizes startup time • (see paper for details)

Utilization Handling Speed Using Client-Based State Sharing • More cookies -> less re-processing, better handling speed • But what about overhead?

Cookies per Request Configuring a Useful Cookie Interval • Cookie size: • a few bytes up to ~64K (limit) • Idea: • For known MTBF (in request times): set cookie interval to achieve a certain goodput (e.g. 98%) • Choice of goodput depending on application's requirements • => Accepting a certain amount of re-processing work • Results: • For realistic MTBF: • high goodput already at moderate cookie rate • overhead significantly rises for too-high goodput -> inefficient!

Conclusion and Outlook • Conclusion • RSerPool is the IETF's upcoming standard for service availability • 3 basic server selection policies • Failure detection mechanisms: • Session monitoring • Endpoint keep-alives • Failover mechanism: • Client-based state sharing • Future Work • From simulation to reality: • Tests with our prototype implementation in the PlanetLab • First results already available [KiVS2007] • Security analysis and robustness against DoS attacks

Thank You for Your Attention!Any Questions? Visit Our Project Homepage: http://tdrwww.iem.uni-due.de/dreibholz/rserpool/ Thomas Dreibholz, dreibh@iem.uni-due.de To be continued ...

The RSerPool Protocol Stack • Aggregate Server Access Protocol (ASAP) • PR  PE: Registration, Deregistration and Monitoring by Home-PR (PR-H) • PR  PU: Server Selection, Failure Reports • Endpoint Handlespace Redundancy Protocol (ENRP) • PR  PR: Handlespace Synchronisation ASAP is IETF's first Session Layer standard!

Motivation • Motivation of RSerPool: • Unified, application-independent solution for service availability • Not available before => Foundation of the IETF RSerPool Working Group • Application Scenarios for RSerPool: • Main motivation: Telephone Signalling (SS7) over IP • Under discussion by the IETF: • Load Balancing • Voice over IP (VoIP) with SIP • IP Flow Information Export (IPFIX) • ... and many more! • Requirements for RSerPool: • “Lightweight” (low resource requirements, e.g. embedded devices!) • Real-Time (quick failover) • Scalability (e.g. to large (corporate) networks) • Extensibility (e.g. by new server selection rules) • Simple (automatic configuration: “just turn on, and it works!”)

Reliable Server Pooling – A Novel IETF Architecture for Availability-Sensitive Services