180 likes | 308 Views
Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks. Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs. IEEE Computer & Communications Workshop, Huntington Beach October 25 th , 2005. Motivation.
E N D
Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs IEEE Computer & Communications Workshop, Huntington Beach October 25th, 2005
Motivation • P2P file-sharing systems are very popular in practice. • Several million simultaneous users collectively. • 60% of all Internet traffic [CacheLogic Research 2005] • Most use an unstructured overlay. • Understanding overlay properties & dynamics is important: • Understanding how existing P2P systems function • Developing and evaluating new systems • Unstructured overlays are not well-understood. • We characterized overlay topology in Gnutella because • Size: one of the largest P2P systems; more than 1 million users • Mature: In use for several years; older studies for comparisons • Open: No reverse-engineering needed http://mirage.cs.uoregon.edu/P2P
Defining the Problem Ultrapeer • Gnutella uses a two-tier overlay. • Improves scalability. • Ultrapeers form an unstructured mesh. • Leaf peers connect to the ultrapeers. • eDonkey, FastTrack are similar. • Studying the overlay requires snapshots. • Snapshots capture the overlay as a graph. • Individual snapshots reveal graph properties. • Consecutive snapshots reveal dynamics. • However, capturing accurate snapshots is difficult. Top-level overlay Leaf http://mirage.cs.uoregon.edu/P2P
Challenges in Capturing Accurate Snapshots • Snapshots are captured iteratively by a crawler. • An ideal snapshot is instantaneous. • But the overlay is large and rapidly changing. • Captured snapshots are likely to be distorted. • Previous studies captured either • Complete snapshots with slow crawler => distorted • Partial snapshots => less distorted, but unrepresentative • Some types of analysis require the whole graph. • Increasing crawler speed reduces distortion in captured snapshots. http://mirage.cs.uoregon.edu/P2P
Cruiser: a Fast Gnutella Crawler • Features: • Distributed, highly parallelized implementation • Dynamic adaptation to bandwidth & CPU constraints • Cruiser is orders of magnitude faster than other P2P crawlers: • Captures one million nodes in around 7 minutes • 140,000 peers/min, compared to 2,500 peers/min [Saroiu 02] • We investigated the effects of speed on distortion. • 4% node distortion and 15% edge distortion • Daniel Stutzbach and Reza Rejaie, “Capturing Accurate Snapshots of the Gnutella Network”, the Global Internet Symposium, March, 2005. http://mirage.cs.uoregon.edu/P2P
Data Set • More than 80,000 snapshots, over the past year. • To examine static properties, we focus on four: • To examine dynamic properties, we use slices: • Each slice is 2 days of ~500 back-to-back snapshots • Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04, and 12/27/04 http://mirage.cs.uoregon.edu/P2P
Graph Properties Implementation heterogeneity Degree Distribution: Top-level degree distribution Ultrapeer-leaf connectivity Degree-distance correlation Reachability: Path lengths Eccentricity Small world properties Resiliency Dynamic Properties Existence of stable core: Uptime distribution Biased connectivity Properties of stable core: Largest connected component Path lengths Clustering coefficient Summary of Characterizations http://mirage.cs.uoregon.edu/P2P
Top-level Degree Max 30 in most clients • This is the degree distribution among ultrapeers. • There are obvious peaks at 30 and 70 neighbors. • A substantial number of ultrapeers have fewer than 30. • What happened to the power-law reported by prior studies? Max 75 in some clients Custom http://mirage.cs.uoregon.edu/P2P
What happened to power-law? • When a crawl is slow, many short-lived peers report long-lived peers as neighbors. • But those neighbors are not all present at the same time. • Degree distribution from a slow crawl resembles prior results. [Ripeanu 02 ICJ] http://mirage.cs.uoregon.edu/P2P
Shortest-Path Distances • Distribution of distances among ultrapeers (left) • 70% of distances are exactly 4 hops. • Distribution of distances among all peers (right) • Most distances are 5 or 6 hops. • Shows the effect of the two-tier with multiple parents • Despite large size, pair-wise distances are short. http://mirage.cs.uoregon.edu/P2P
Is Gnutella a Small World? • Small worlds arise naturally in many places. • Movies actors, power grid, co-authors of papers • Small world graphs have short distances, but significant clustering, compared to a similar random graph. • Gnutella is a small world. • Very high clustering adversely affects flooding queries. • But Gnutella isn’t too clustered to affect performance. http://mirage.cs.uoregon.edu/P2P
Random Highest degree first Resiliency to Node Failure • Ratio of connected peers after node failure. • The Gnutella topology is extremely resilient to random node failure. • It’s resilient even when the highest-degree nodes are removed. • Complex algorithms are not necessary to achieve resiliency. http://mirage.cs.uoregon.edu/P2P
Dynamic Properties • How does node churn affect overlay dynamics? • Are some “regions” of the overlay more stable? • How can we identify such a region? • Methodology: • Capture a long series of back-to-back snapshots • Estimate the uptime of individual peers in the last snapshot • Group peers with uptime higher than a threshold • Examine biased connectivity within each group Present for 5 snapshots Present for 2 snapshots Departed peer Newly arrived peer http://mirage.cs.uoregon.edu/P2P Time
Stable Core • Most peers have a short uptime. • Other peers have been around for a long time. • Stable core: a set of peers with uptime higher than a threshold (T). • Higher threshold => more stable group of peers T > 20 h T > 10 h http://mirage.cs.uoregon.edu/P2P
Biased Connectivity • Hypothesis: long-lived nodes tend to be more connected to other long-lived nodes • Rationale: Once connected, they stay connected. • Long-lived peers have more opportunities to become neighbor. • To quantify bias in the connectivity of the stable core: • Randomize the edges to create a graph without biased connectivity. • Compare the edges in the observed stable core with the randomized graph. http://mirage.cs.uoregon.edu/P2P
Stable Core Edges • 20%—40% more edges in the stable core compared to random. • Connectivity exhibits an onion-like biased connectivity where peers are more likely to connect to other peers with same/higher uptime. • We examined other properties of the stable core. • Despite high churn, there is a relatively stable “backbone”. http://mirage.cs.uoregon.edu/P2P
Summary • Characterizations of Gnutella overlay based on recent and accurate snapshots. • Graph properties: • The degree distribution in Gnutella is not power law. • Gnutella exhibits small world characteristics. • Gnutella is resilient. • Dynamic properties: • There is a stable core within the overlay topology. • Peer churn causes the stable core to exhibit an onion-like biased connectivity. • This effect is likely to occur in other unstructured P2P systems. • Daniel Stutzbach, Reza Rejaie, Subhabrata Sen, “Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems”, Internet Measurement Conference, Berkeley, 2005 http://mirage.cs.uoregon.edu/P2P
Future Work • Examining underlying causes of the biased connectivity. • Exploring long-term trends in overlay properties. • Characterizing churn • Characterizing properties of other widely-deployed P2P systems • Kad (a DHT with more than 1 million users) • BitTorrent • Developing sampling techniques for P2P http://mirage.cs.uoregon.edu/P2P