190 likes | 322 Views
Reliability and Relay Selection in Peer-to-Peer Communication Systems. Salman A. Baset and Henning Schulzrinne Internet Real-time Laboratory Department of Computer Science Columbia University August 3 rd , 2010. Background. Peer-to-peer communication system. media relay (or relay).
E N D
Reliability and Relay Selection in Peer-to-Peer Communication Systems Salman A. Baset and Henning Schulzrinne Internet Real-time Laboratory Department of Computer Science Columbia University August 3rd, 2010
Peer-to-peer communication system media relay (or relay) node A node E NAT / firewall network address of node B? (3) media (2) signaling Reliability of p2p. comm systems? (2) (4) media (1) (3) signaling Relay selection techniques? P2P / PSTN gateway (1) (1) NAT / firewall network address of node E? (2) (1) node B (2) signaling • nodes form an overlay • share responsibilities for message routing, signaling, media relaying • super nodes, ordinary nodes node C node D node = user agent
Outline Motivation How to find a relay in O(1)hop that minimizes latencyand user annoyance? Sources of unreliability in p2p comm. systems? Relay selection Reliability framework Reliability and Relay Selection User annoyance Model for relayed calls How many relays per call to achieve 99.9% success rate? How to quantify the interference of relayed calls with other applications? Improving reliability of relayed calls How to improve the reliability of relayed calls?
Reliability framework • Reliability=Proportion of completed calls (99.9%) • Goal • understand reasons for call failure • devise techniques to improve them • Reasons for call failure • (1) distributed search fails to find online callee • DHT lookup • (2) distributed search fails to find a suitable relay • DHT lookup or any appropriate relay selection scheme • (3) relay fails during voice/video session • understand and improve reliability for relayed calls • devise techniques for finding a relay
Outline Motivation Reliability framework Reliability and Relay Selection Model for relayed calls How many relays per call to achieve 99.9% success rate?
Understanding reliability of relayed calls • Percentage of VoIP calls that need relaying • the provider knows • 15-20% calls for a commercial client-server IM / VoIP application • 341 relays in 20 days for Skype [Suh05Infocom] • 17 per day for a super node (~50K super nodes) • Some client-server providers relay all calls • NAT studies
Understanding reliability of relayed calls For desired reliability, minimum relays per call? • let Xi and Ri lifetime and residual lifetime of a relay candidate (i.i.d.) • let D denote the call duration. • when ith relay fails, call is switched (i+1)st relay which is instantly selected from the global pool of all relays. D Rk-1 Rk R1 k 1 2 K-1 Smallest k such that call completion prob. is greater than or equal to desired reliability 99.9% k depends on the relationship b/w node lifetime and call duration
Understanding reliability of relayed calls Exponential node lifetimes Skype node lifetimes 95% of Skype relay calls last less than 60 mins Mean node lifetime Mean call duration lifetimes approximated as pareto 95% of Skype relayed call durations – minimum of 3 relays to maintain 99.9% success rate What if the system does not have enough relays?
Outline Motivation Reliability framework Reliability and Relay Selection Model for relayed calls Improving reliability of relayed calls How to improve the reliability of relayed calls?
1-(λ + μ) 1-2λ λ 2λ 2 1 0 μ Improving reliability of relayed calls • Approach 1 -- no-replacement • select k relays in the beginning of a call • do not replace failed relays • Approach 2 -- with-replacement • select k relays in the beginning of a call • replace failed relays after μ • no failure during switch over • Skype uses 2-relay with-replacement scheme pure death process [Bir04]
Improving reliability of relayed calls • No-replacement – add more relays? • diminishing returns • 1 vs. 2 vs. 3 vs. 4 • MTTF 50% 22% 13% (exp) • No-replacement (NR) vs. with-replacement (WR) • depends on mean lifetime, call duration, repair time 2 relay with-replacement search time=60s Skype mean=12 hours Median=4 hours
Outline Motivation Reliability framework Reliability and Relay Selection User annoyance Model for relayed calls How to quantify the interference of the relayed call with other applications? Improving reliability of relayed calls
User annoyance • Interference of relayed call with other applications running on the relay machine • File sharing = mutually beneficial (tit-for-tat) • Relaying = altruistic • Provide incentives or minimize user annoyance • How to quantify user annoyance? • automatically? • spare network capacity • Issues in measuring spare capacity? • bandwidth tests, ALTO
Outline Motivation How to find a relay in O(1)hop that minimizes latencyand user annoyance? Relay selection Reliability framework Reliability and Relay Selection User annoyance Model for relayed calls Improving reliability of relayed calls
Distributed relay selection • Goal O(1) hop • 2-level hierarchical network Give me a relay Here is a randomly selected relay NAT search performance dropped calls close-by NAT 1-relay local-random scheme
Distributed relay selection • Results • strategies perform similar near system collapse point • minimizing latency increases annoyance, number of jobs per relay, vice versa • threshold approach performs reasonably well • Delay • User annoyance • interference with user applications • file sharing (draft idle peers) • spare capacity • random • mindelay • select relay with minimum delay • netmax • select relay with maximum spare bw • threshold • select relays with delay < 150 ms and maximum spare capacity
Related work • Modeling • On lifetime-based node failure and stochastic resilience of decentralized peer-to-peer networks [Leonard09ToN] • Minimizing churn • Minimizing churn in distributed systems [Godfrey06Sigcom] • Relay selection • ASAP: an AS-aware peer relay protocol for high quality VoIP [Ren06ICDCS] $ diff this related_work • focus on node isolation • minimizing churn is not sufficient • reliability, relay selection, user annoyance
Conclusion • Framework for analyzing reliability in p2p communication systems • A model for reliability of relayed calls • Reliability improvement schemes • User annoyance • Distributed relay selection