420 likes | 679 Views
Analysis and Design of Algorithms for Peer-to-Peer Networks. Moritz Steiner Thesis Defense. Ernst Biersack Wolfgang Effelsberg. overlay edge. Overlay networks. More about overlays. Unstructured overlays No constraints on the overlay topology or data placement
E N D
Analysis and Design of Algorithms for Peer-to-Peer Networks Moritz Steiner Thesis Defense Ernst Biersack Wolfgang Effelsberg
overlay edge Overlay networks
More about overlays • Unstructured overlays • No constraints on the overlay topology or data placement • Query – flood or random walk • Structured overlays (Distributed Hash Tables) • Constraints both on topology and data placement • log(N) hops, log(N) neighbors • Efficient support for exact match query
Analysis and Measurements of Real WorldPeer-to-Peer Networks
What is a DHT? • A distributed database for publishing and searching information • Consists of many peers, each one is responsible for storing part of the database • What is a key: unique identifier • Key = hash(IP@), or • Key = hash(string) • Each peer and each object is identified by its key • How to partition content of DB • Use key of the object to decide on which peer to store information
What is a DHT? (cont’) N2 N3 N1 Internet Publish(hash(“title”)) ? Client Lookup(hash(“title”)) Publisher N4 N6 N5 • Important issues • How to partition key space • How to route • How to maintain information under churn
KAD • Study KAD which is a distributed index, running on many peers • Why interesting? • Only real-world “production” Distributed Hash Table • Very popular • eMule, aMule, (Azureus) • Permanent KAD id • Possible to track the peer behavior
Routing Table • Peer identifier: 128 bit string • Distance metric: bitwise XOR XOR - Distance from our peer 1* 01* 001* 000* 2-Buckets
KAD Architecture Lookup module is used by both, the Publishing and Retrieval module
Iterative Lookup • KAD uses iterative routing • Source is responsible for entire lookup process • At each step, source sends lookup request to the next hop and waits for reply • Advantages of iterative routing • Lookup messages cannot be lost • Iterative routing is easier to debug Recursive Routing
KAD Architecture Note: Lookup module is used by both, the Publishing and Search module
Publish: How and Where • Where to store the information for a given kID? • On 10 nodes, who’s first 8-bits are the same as kID kID zone • A zone defined by the first 8 bits is 1/256 of the entire key space, and contains several thousand peers • How to find a key later (contact several thousand peers?) • This high replication assures that Key is found despite node churn
Publishing and Retrieval • Iterative lookup (only 3 hops to get to target) • High redundancy (and overhead) to cope with churn in • Routing table • Parallel lookup • Publishing • Theory vs. Practice • Main issue is not number of hops, but • How to assure persistence under churn
User Behavior in KAD • Where are they? • For how long do they stay connected? • Do they come back? • Are there regional difference?
Challenge: The Full Peer Crawl • Our method: Full crawl to take a complete snapshot of all peers in KAD at a given instant • Contacts 1.5 Mio to 4.5 Mio peers • Takes 8-11 minutes • Saturates a 100 Mbit/sec link at Uni Mannheim • 8 GBytes of traffic • Carried out once a day for over a year Versatile Tool: KAD, Overnet, BitTorrent (Azureus), and the Storm Worm
The Full Peer Crawl • Our Approach • Single machine • Main Memory • State less • Un-synchronized queries • Traditional Approach • Cluster of computers • Centralized database • State full client • Synchronized queries • Crucial: Synchronization between the machines
Discover the Peers • Functionality • Query seed peer for contacts using “route requests” • Breadth First Search to explore the full graph • Stop when no new peers are discovered
Diurnal Pattern 21:30 Beijing 21:00 Madrid, Paris, Rome
Total ~ 2500 China ~ 1500 New Peers • About 700,000 new KAD IDs join KAD every day for the first time. 260 million new peers/year THIS IS HARD TO BELIEVE ! New peers: peersseen the first time on day x in one zone
Session Length Weibull distribution provides a very good fit of session length distribution: Predicts the stability of a peer
Crawl Conclusions • China, Europe, rest of the world • Chinese are distinct, are connected for less time • KAD ID aliasing • KAD IDs are not persistent as was assumed before • Peers come back over and over again • mean lifetime greater than 7 months • Core of stable peers with extremely long session times • Up to 78 days • Session times are heavy tailed (Weibull distributed) • Possible to predict the future behavior Developed the (today’s) only crawler for the full KAD network
Content in KAD • What content is shared? • Movies? Music? Legal material? • What keywords are popular? • How much control traffic is generated?
The Content Spy • How to spy on part of the hash space called I ? • Introduce a large number of spy peers that have KAD ids in I • How many spy peers? • Scalability of spy • All the spy peers are running on a single PC • To reduce the memory requirements, no state is kept 00…00 11…11
Spying: Control Traffic • Spied on the 8bit zone <e3> during 12 hours • Search • Messages 561 542 • Traffic 10,8 Mbytes • Publish • Messages 5 549 183 • Traffic 966 MBytes • Route • Messages 9 761 278 • Traffic 342 MBytes x10 x100
Spying: Keyword Popularity in zone <e3> The most popular words are so-called Stop Words
Spy Conclusion • Findings • Interesting methodology for spying • Publish traffic 100 times larger than search traffic • Large content base with more than 80 Mio files • Improvements • Don’t publish stop words • Modify re-publish frequency to increase time until next republish: • Reduces publish traffic by factor of 10
Contributions • Measurement Methodologies • The (today’s) fastest crawler for the KAD network • First to crawl the entire KAD network • Content Spy • Instrumented client • Proposed Improvements • Publishing Overhead • Security • Content Retrieval
An Augmented Delaunay Overlay forDecentralized Virtual Worlds
Local knowledge • Networked Virtual Environment (NVE) based on the Delaunay Triangulation.
Ignoring the physical network • Neighbors in the overlay may be far away in the network topology (and the other way around…)
Goal: minimize the delay penalty • Small World • hops => hops • Topology awareness • Giving priority to nodes close in the physical network Augment the overlay, introduce a second type of neighbor relationship: shortcuts Contribution: Exploring these two approaches together
Shortcuts: How to find? • How to find? • Join procedure • Traveling the virtual world • Message forwarding • Learn from existing shortcuts • How to choose? • Network Proximity Awareness • delay < x ms • Network coordinate system or ping • Complete coverage of the virtual world in order to create a small-world
Shortcuts: How to use? • Greedy Walk • Use shortcuts in priority • Fallback to Delaunay routing • Minimize the remaining distance in the overlay • First using shortcuts • Travel long overlay distances • Travel short underlay distances • Close to the destination, using Delaunay neighbors • Travel short overlay distances • Travel long underlay distances
Simulation setup • Overlay based on the Delaunay Triangulation • Gt-itm Network Topology Generator • 2 Tier • A fraction of nodes with one neighbor are chosen to participate in the overlay • Random assignment between the nodes (underlay) and the peers (overlay)
Results: Intuition Path in the overlay Path in the underlay Shortcuts Withouts Shortcuts
Results: Shortcut coverage Random distribution Clustered distribution
Results: Delay Polynomial -> logarithmic increase
Conclusion • Nodes that are close in the underlay may be far away in the overlay • Reducing average number of hops and delay by augmenting the overlay in very simple way • Short in the underlay • Long in the overlay • Approach is not limited to Delaunay based overlays
Contributions • Distributed algorithms for the construction and maintenance of a peer-to-peer network based on a Delaunay Triangulation (in n-dimensional spaces) • Dynamic and distributed clustering of peers • Augmenting the triangulation with (a few) shortcuts to reduce the delay penalty
Thank You for Your Attention! Questions!