220 likes | 385 Views
Handling Churn in Less-structured P2P Systems Elders Know Best . Yi Qiao & Fabián E. Bustamante Department of Electrical Engineering & Computer Science Northwestern University {yqiao,fabianb}@cs.northwestern.edu. John Lennon, 1940-1980. Toward Massively Distributed Systems.
E N D
Handling Churn in Less-structured P2P SystemsElders Know Best Yi Qiao & Fabián E. Bustamante Department of Electrical Engineering & Computer Science Northwestern University {yqiao,fabianb}@cs.northwestern.edu
John Lennon, 1940-1980 Toward Massively Distributed Systems • What scale may bring • Virtually infinite resources always available • Information everywhere at anytime • Power to the people! • … but not for free • Resource management • Heterogeneity • Naming • Administration • Measurement, testing & debugging in the mist of chaos Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Peers’ Transiency (a.k.a. Churn) • The problem with peers’ transiency • Very large peer populations • Autonomous nature of peers • Architectural mutual dependencies of P2P systems • Median session length from 1hr to 1’ [Sariou ’02], [Bustamante ‘03], [Rhea ’04] … • Why should you care? • E.g. for data sharing applications: control traffic cost, spread of queries, cache effectiveness, degree of replication, … Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Peer Lifespan Distribution Active probing of ~1 million peers’ lifespans RCDF of peers with lifespan in [~22’, 3.5 days] Pareto distribution of the form λTk (k < 0) The Lifespan Approach A peer’s expected remaining session length is proportional to the peer’s ageBasis for churn-resilient protocols and strategies! Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Outline • Motivation & background • Lifespan-based protocols and strategies • Organizational protocols • Query-related strategies • Evaluation • Conclusions Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Organizational Protocols • The way a peer-to-peer system is structured • Unstructured (UDP) - All peers equal; e.g., Gnutella v0.4 • Loosely structured (HDP) - Leaf & super-peers; e.g., Gnutella v0.6, Kazaa • Highly structured (DHT) • Lifespan-based organizational protocols • Opt for longer lived peers when choosing neighbors and/or recommending peers to others [Bustamante02] • Lifespan UDP (LUDP) • Opt for older peers for connections; random recommendations • Lifespan HDP (LHDP) • Leaf and super-peers opt for older super peers for connections Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Query, Caching & Replication Strategies • Flooding • Query is propagated to all neighbors within a radius • Inherently un-scalable • K-random walks • k parallel query messages randomly forwarded at each hop[Lv02] • Improvement factoring in node’s degree [Adamic01] , capacity[Lv03], … • Lifespan-based k-random walk Query • Opt for older peers when forwarding a query walker • A simple weighted probabilistic approach works well • Avoids collision between walkers • Prevents hot spots Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Query, Caching & Replication Strategies • Neighbor Caching with incremental Update (NCU) • Path Caching with eXpiration (PCX) [Roussopoulos03] • Effectiveness not obvious for less-structured systems • Regional Caching with eXpiration (RCX) - new • Peers in query hit path push query hit entries to some of their neighbors • Lifespan-based RCX • Caching in older neighbors along the path • Expiration threshold for cached entries is based on age of target peer Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Query, Caching & Replication Strategies • Simple replication – make replicas on requesters • Proactive replication (path replication) puts more replicas on multiple peers • Regional replication - more effective than path replication • Put replicas on some neighbors of each peer along the query path • Lifespan-based Regional Replication (LRRep) • Opt for in-the-path-region peers’ older neighbors for placing replicas • Upper-bound for number of replicas each peer can store Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Determining Peer’s Age • Effectiveness of lifespan-based approach, depends on • Fitness of session length estimators • Accuracy of peers’ age information … • A lightweight distributed protocol for age determination • Some good characteristics • Age never directly requested from peer itself • Trimming/sampling reduces the probability of small cabals P trying to determine C’s age1.Witness collection Get from C list of potential witnesses & interaction windows2.Witness sampling & trimming a. Trim witness with suspiciously large interaction windows b. Sample final list W3.Collecting testimonies & determining age a. Validate C reported interaction windows asking peers in W b. Determine C’s age Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Outline • Motivation & background • Lifespan-based protocols and strategies • Organizational protocols • Query-related strategies • Evaluation • Conclusions Query Caching Replication Organizational protocol Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Evaluation Setups • Simulation • Simulations driven by 4 of the 20 lifespan traces • ~150,000 peers, 3000-4000 online at any time • 4 query walkers, with TTL = 20 • Simulated time 511,000” (~6 days) • Wide-area • Modified open-source Gnutella client • 150 PlanetLab nodes • 200-300 online peers during experiment • 3 query walkers, with TTL = 7 • Simulated time 511,000” (~6 days) Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
k-random-walk query (RQuery) Simple replication (SRep) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Lifespan-based Unstructured (LUDP) Basic Advantages of Lifespan Approach … and 50- 70% more query hits LUDP has 50-70% shorter query resolution time than UDP Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation
k-random-walk query (RQuery) Simple replication (SRep) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Lifespan-based Unstructured (LUDP) Basic Advantages of Lifespan Approach Comparable results in wide-area experiments LUDP delivers >40% more query hits than UDP Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Wide-Area
k-random-walk query (RQuery) Simple replication (SRep) k-random-walk query (RQuery) Simple replication (SRep) Random Hierarchical (HDP) Lifespan-based Hierarchical (LHDP) Basic Advantages of Lifespan Approach And with hierarchical protocols … and more query hits Significantly faster query - 3x faster for 50% of queries Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation
k-random-walk query (RQuery) Regional replication (RRRep) Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Unstructured (UDP) Unstructured (UDP) Lifespan-based Query-related Strategies … and more query hits Significantly faster query - 2-3x faster for 50% of queries Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Wide-Area
k-random-walk query (RQuery) Regional replication (RRRep) Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Random Hierarchical (HDP) Lifespan-based Hierarchical (LHDP) Combined Strengths … and 3x improvement on query hits >4x times faster query resolution times Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation
Conclusions & Future Work • Need to address churn resilience in massively distributed systems • Lifespan is a good base for structural resilient systems • Illustrative lifespan-based organizational protocols & strategies • Demonstrated effectiveness through trace-driven simulations & wide-area experiments • Lower control overhead • Faster query resolution • Higher query hits • Currently applying similar ideas to build structurally churn-resilient DHT systems Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005
Basic Advantages of Lifespan Approach • Relative query satisfaction: the percentage of queries achieving Z satisfaction (i.e. at least z query hits) • Why lifespan-based LUDP is better? • Queries more likely to reach older peers • which store more replicas, • cache indexes longer, and • are much less likely to breakdown query/reply paths Using PCX, LUDP results on faster query resolution. Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation
k-random-walk query (RQuery) Simple replication (SRep) Lifespan k-random-walk query (LQuery) Simple replication (SRep) Unstructured (UDP) Unstructured (UDP) Lifespan-based Query-related Strategies Just from query: ~100% improvement on query resolution time & hit numbers Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation
k-random-walk query (RQuery) Regional caching (RRCX) Regional replication (RRRep) Unstructured (UDP) Lifespan k-random-walk query (LQuery) Lifespan-based regional caching (LRCX) Lifespan-based regional replication (LRRep) Unstructured (UDP) Lifespan-based Query-related Strategies median query hit number 25 to 60 90% query resolution time: 0.2 sec to 0.55 sec Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation