170 likes | 295 Views
Large-Scale Monitoring of DHT Traffic. Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel Stutzbach – Stutzbach Enterprises. International Workshop on Peer-to-Peer Systems (IPTPS) 2009, Boston MA . Introduction.
E N D
Large-Scale Monitoring of DHT Traffic Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel Stutzbach – Stutzbach Enterprises International Workshop on Peer-to-Peer Systems (IPTPS) 2009, Boston MA.
Introduction • Distributed Hash Tables (DHT) provide a scalable approach for distributed content management, e.g. file sharing • DHTs have been an active area of research since 2001 • DHTs have been recently deployed in real world applications. • e.g. Kad, Azureus, Mojito • Characterizing traffic in widely deployed DHTs allows us to: • Identify opportunities for performance improvement. • Detect anomalous behavior. • Accurately capturing traffic in a widely deployed DHT is challenging. IPTPS 2009 Boston, MA.
Challenges in Capturing DHT Traffic • Common approach for capturing DHT traffic is to use a instrumented peers as monitors. • Using a small number of monitors can not capture an accurate view of traffic • Using a large number of monitors is expensive and may changeand/or disrupt the DHT. • e.g. 8 monitors per peer [Steiner:DBISP2P 2007 ] Goal: Capturing a representative view of DHT traffic efficiently without changing and/or disrupting the system. IPTPS 2009 Boston, MA.
Pr Pi Pi Pt Classifying DHT traffic DHT Traffic Two types of messages are observed by peer p: • Routing Traffic:Messages that are routed by but not destined to peer p. • Depends on DHT geometry and peer visibility. • Destination Traffic:Messages that are destined to peer p • Demonstrates DHT usage. Pr Pi Pi Pt We focus on capturing destination traffic IPTPS 2009 Boston, MA.
This paper presents • Montra, a new approach to efficiently & accurately capture DHT traffic without disrupting the system • Montra should be applicable to most DHTs • Validation of Montra over a deployed DHT, Kad. • Preliminary characterization of Kad traffic IPTPS 2009 Boston, MA.
Key Idea Montra • Real-world DHTs add redundancy to cope with churn: • Each file is published at multiple peers • Search operation identifies multiple peers • If monitor peer Pm is the closest peer to the target peer Pt, Pm will observe all the destination traffic of Pt IPTPS 2009 Boston, MA.
…… …… …… 0x8 0xe 0xf …… …… …… Key Idea Montra • Request Orig. (Pr) searches destination for content ID 0xe. • Node 0xe (Pt) is closest to requested ID 0xe. • Monitor 0xf (Pm) captures the request. Routing Table 0xe 0xe 0xe 0x0 0x8 0xe 0x1 0x9 0xf ID Space 0xe 0xf Pt Pm Pr • Placing one monitor per peer will provide an accurate view of traffic. • How to avoid/minimize the impact on system? IPTPS 2009 Boston, MA.
Request Minimally Visible Monitors (MVMs) Montra • To minimize the disruption on the system, we use Minimally Visible Monitors (MVMs). • MVMs are only visible to (i.e. exchange messages with) their target peer. • Deploying a large number of MVMs causes minmum/no disruption in the system. • Each MVM slightly changes the routing table of the target peer. Request Request Request Pt Pr Pr Pr Pm ID Space Response Request Request IPTPS 2009 Boston, MA.
0xa9 0xad 0xaf 0xa Pm Pm Pm 0xa8 0xae 0xac Identifying Destination Peers Montra - MVMs • In the presence of churn and packet loss, a single peer (or MVM) can not reliably identify its destination traffic. • Closer peers may exist. • Requires a regional view of traffic • We monitor all peers in a continuous zone of ID space. e.g. 4 bit zone 0xa • Periodically crawl to detect all the peers in the zone. • All the captured requests within a zone have a destination in that zone. • Destination peers are identified during post-processing. • For a given captured request find the closest monitored peer. IPTPS 2009 Boston, MA.
Validation • We quantify the accuracy of Montra from 2 different angles, using the Kad network: • Content Accuracy: What fraction of destination traffic per zone is captured? • Peer Accuracy: How accurately Montra determines destination peers? • Validation Methodology: • Instrumented Source • Instrumented Destination IPTPS 2009 Boston, MA.
Instrumented Source Validation Validation • Use instrumented Kad client to send requests for random IDs in a zone (Instrumented Source). • Log all requests and their destinations. • Monitor the same zone using Montra. • Compare source and monitor logs to determine content and peer accuracy. • Uses synthetic workload but the requests are distributed over the entire zone. IPTPS 2009 Boston, MA.
Instrumented Destination Validation Validation • Use instrumented Kad client to passively observe and log requests (Instrumented Destination). • Monitor the same zone simultaneously. • Compare destination and monitor logs. • Using some heuristics • Uses real-world workload but the requests are localized to the instrumented destination. IPTPS 2009 Boston, MA.
Results Validation Content Accuracy • Zone size decreases with zone prefix length. • Both the figures show similar results. • Instrumented Source: increasing zone size beyond 6-bit degrades accuracy • Time taken to crawl <=5 bit zone hinders prompt addition of MVMs. • Instrumented Destination: zone size has minimal impact on accuracy. • MVMs are promptly added around instrumented destination. Peer Accuracy IPTPS 2009 Boston, MA.
Publish Request Rate Characterization Kad Keywords • How request rate varies across different zones? • The heavily skewed behavior is consistent across different zones • Each zone has some hot keywords and files • Rate for Publish keywords is higher than files. • A lot of common names occur in filenames • See the paper for more results. Files IPTPS 2009 Boston, MA.
Characterization Kad Relation Between Published and Searched Content. Files • What is the balance between supply and demand for a file? • Balance = Pub./(Sear. + Pub) • 15% of files are searched but never published • Newly popular files that are not yet widely available. • 60% of files are published but never searched. • Popular files from past that are highly available. • 95% of keywords are published but never searched • A very small pool of keywords is actually used. Keywords IPTPS 2009 Boston, MA.
Conclusion • Montra is a new technique for capturing DHT traffic accurately and efficiently without disrupting the system. • Montra’s accuracy was validated over the Kad network. • Presented initial characterization of traffic in Kad • Ongoing work: • Further evaluation of Montra over other DHTs, e.g. Azureus, Mojito • Further analysis of captured traffic in Kad and other DHTs • Exploring other usage of Montra, e.g. detecting botnet c&c IPTPS 2009 Boston, MA.
Search Request Rate Characterization Kad Keywords • Search file and search keyword requests have the lowest range of requests • Demonstrates user behavior. • User behavior for search keywords is different across different zones. • Some zones have more popular keywords • User behavior for search files across different zones is consistent. Files IPTPS 2009 Boston, MA.