150 likes | 162 Views
Learn about a cluster-based architecture for P2P systems focusing on scalable content location, network-aware clustering, and improved query routing for enhanced performance. Explore how CAP design enhances P2P applications through trace analysis, simulations, and real-world deployment assessments.
E N D
Early Measurements of a Cluster-based Architecture for P2P Systems Yinglian Xie Carnegie Mellon University Balachander Krishnamurthy, Jia Wang ATT Labs---Research
Motivation • Peer-to-peer(P2P) applications provide us with a new content service model • End-hosts self organized into an overlay network and share content with each other • For a wide deployment of P2P applications • We need a scalable content location and routing scheme in the application layer • We need to study and understand P2P traffic patterns
Recent Work • Existing approaches for content location • Napster: uses a centralized server • Gnutella: relies on flooding of queries • Recent designs • Distributed indexing schemes based on hash functions • CAN, Chord, Pastry, Tapestry
Our Work • A Cluster-based architecture (CAP) for P2P systems • Example application: distributed search (support keyword searching) • Design: using network-aware clustering • Early measurements of CAP • trace analysis + simulations
CAP System Design • Network-aware clustering • B. Krishnamurthy and J.Wang. On Network-Aware Clustering of Web Clients. In proceedings of ACM Sigcomm, August 2000 • An effective technique to group clients that are topologically close and under common administrative domain • Apply network-aware clustering to P2P applications • An additional level in the hierarchy • Less dynamism • More scalability
Clustering server delegate client CAP Architecture • Three entities • Clustering server • Delegate • Client • Two operations • Node join and node leave • Query lookup
Inter-cluster Routing • Each query has a maximum search depth • Each delegate keeps a neighbor list • Assigned randomly when the delegate joins the network • Updated gradually based on application requirements • Depth-first search among neighbors
CAP Evaluation • Collect Gnutella traces, apply network-aware clustering in trace data analysis • To examine the potential advantage of using network-aware clustering • Trace-driven simulations • Measure CAP system performance based on real deployment (ongoing work)
Collecting Gnutella Trace • A modified open source Gnutella client (gnut) to passively monitor and log all Gnutella messages Table 1 Traces with unlimited connections Table 2 Traces with limited connections
Cluster Distribution • CMU trace • 5/24/2001 – 5/25/2001, 799,386 IP addresses, 45,129 clusters • Clustering helps reduce query latency by caching repeated queries
Client and Cluster Distribution along Time • Network-aware clustering helps reduce dynamism in the P2P network
Simulation • Trace-driven simulation • Use Gnutella trace to generate “join, leave, search” • Assume the query distribution follows the file distribution • Performance metrics • Hit rate • Overhead • Search Latency
Hit Rate • Use CMU trace • 1,000 node stationary network • 311 clusters • 4,615search messages • 3,793 unique files
Overhead and Search Latency • Overhead • Messages per search, forward operations per delegate • In Gnutella, overhead grows exponentially • In CAP, overhead grows linearly • Search Latency • Application level hop length • In CAP, search path length is short
Summary • CAP is promising to increase stability and scalability of distributed applications Ongoing work: We are implementing CAP, deploying it in machines around the world, and measuring the performance