1 / 10

Availability in Wide-Area Service Composition

Availability in Wide-Area Service Composition. Bhaskaran Raman and Randy H. Katz SAHARA, EECS, U.C.Berkeley. 10% of paths have only 95% availability. Problem Statement. Poor availability of wide-area (inter-domain) Internet paths. [Labovitz, FTCS’99].

Download Presentation

Availability in Wide-Area Service Composition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Availability in Wide-Area Service Composition Bhaskaran Raman and Randy H. Katz SAHARA, EECS, U.C.Berkeley

  2. 10% of paths have only 95% availability Problem Statement Poor availability of wide-area (inter-domain) Internet paths [Labovitz, FTCS’99] BGP recovery can take several 10s of seconds [Labovitz, SIGCOMM’00]

  3. Internet Source Composed services Destination Application plane Peering: exchange perf. info. Service cluster: compute cluster capable of running services Functionalities at the Cluster-Manager Peering relations, Overlay network Service-Level Path Creation, Maintenance, and Recovery Logical platform Link-State Propagation Finding Overlay Entry/Exit Location of Service Replicas Service clusters Hardware platform At-least -once UDP Perf. Meas. Liveness Detection Architecture

  4. Time Timeout for failure detection Time Timeout period “Failure” detection in the Wide-Area • Two important characteristics: • Distbn. of outage periods • Rate of occurrence • Wide-Area traces • 12 pairs of hosts: Berkeley, Stanford, UIUC, CMU, TU-Berlin, UNSW • 300ms heart-beat • Approx. 2sec timeout • Low rate of occurrence (once an hour) • Good for many real-time applications

  5. Key Design Points • Overlay size: how many nodes? • A comparison: Akamai cache servers • O(10,000) servers for Internet-wide operation • Probably a lesser number of data-center locations • Link-state floods: • Twice for each failure • For a 1,000-node graph; estimate #edges = 10,000 • Failures (>1.8 sec outage): O(once an hour) in the worst case • Only about 6 floods/second in the entire network! • Graph computation: • Modified version of Dijkstra’s for service composition • O(k*E*log(N)) computation time; k = #services composed • For 6,510-node network, this takes 50ms • Huge overhead, but: path caching helps • Memory: a few MB

  6. Wide-Area experiments: setup • 8 nodes: • Berkeley, Stanford, UCSD, CMU • Cable modem (Berkeley) • DSL (San Francisco) • UNSW (Australia), TU-Berlin (Germany) • Text-to-speech composed sessions • Half with destinations at Berkeley, CMU • Half with recovery algo enabled, other half disabled • 4 paths in system at any time • Duration of session: 2min 30sec • Run for 4 days • Metric: loss-rate measured in 5sec intervals

  7. Loss-rate for a pair of paths

  8. CDF of loss-rates of all paths failed

  9. CDF of gaps seen at client

  10. Summary • Failure detection makes sense in ~2sec • Improvement in availability for real-time applications • Text-to-speech composed application • About 3.5-4 sec recovery time • 2,000ms failure detection timeout • 1,000ms recovery signaling • 500-1,000ms state restoration (re-process current text sentence) • Of the 2,872 paths, 18 were recovered (0.63%) • Availability: Number of 5sec periods with >10% outage: • Other issues: stability, scaling, load-balancing • Studied using Millennium emulation platform

More Related