Analysis and Design of Algorithms for Peer-to-Peer Networks

Analysis and Design of Algorithms for Peer-to-Peer Networks Moritz Steiner Thesis Defense Ernst Biersack Wolfgang Effelsberg

overlay edge Overlay networks

More about overlays • Unstructured overlays • No constraints on the overlay topology or data placement • Query – flood or random walk • Structured overlays (Distributed Hash Tables) • Constraints both on topology and data placement • log(N) hops, log(N) neighbors • Efficient support for exact match query

Analysis and Measurements of Real WorldPeer-to-Peer Networks

What is a DHT? • A distributed database for publishing and searching information • Consists of many peers, each one is responsible for storing part of the database • What is a key: unique identifier • Key = hash(IP@), or • Key = hash(string) • Each peer and each object is identified by its key • How to partition content of DB • Use key of the object to decide on which peer to store information

What is a DHT? (cont’) N2 N3 N1 Internet Publish(hash(“title”)) ? Client Lookup(hash(“title”)) Publisher N4 N6 N5 • Important issues • How to partition key space • How to route • How to maintain information under churn

KAD • Study KAD which is a distributed index, running on many peers • Why interesting? • Only real-world “production” Distributed Hash Table • Very popular • eMule, aMule, (Azureus) • Permanent KAD id • Possible to track the peer behavior

KAD Architecture

Routing Table • Peer identifier: 128 bit string • Distance metric: bitwise XOR XOR - Distance from our peer 1* 01* 001* 000* 2-Buckets

KAD Architecture Lookup module is used by both, the Publishing and Retrieval module

Iterative Lookup • KAD uses iterative routing • Source is responsible for entire lookup process • At each step, source sends lookup request to the next hop and waits for reply • Advantages of iterative routing • Lookup messages cannot be lost • Iterative routing is easier to debug Recursive Routing

KAD Architecture Note: Lookup module is used by both, the Publishing and Search module

Publish: How and Where • Where to store the information for a given kID? • On 10 nodes, who’s first 8-bits are the same as kID kID zone • A zone defined by the first 8 bits is 1/256 of the entire key space, and contains several thousand peers • How to find a key later (contact several thousand peers?) • This high replication assures that Key is found despite node churn

Publishing and Retrieval • Iterative lookup (only 3 hops to get to target) • High redundancy (and overhead) to cope with churn in • Routing table • Parallel lookup • Publishing • Theory vs. Practice • Main issue is not number of hops, but • How to assure persistence under churn

User Behavior in KAD • Where are they? • For how long do they stay connected? • Do they come back? • Are there regional difference?

Challenge: The Full Peer Crawl • Our method: Full crawl to take a complete snapshot of all peers in KAD at a given instant • Contacts 1.5 Mio to 4.5 Mio peers • Takes 8-11 minutes • Saturates a 100 Mbit/sec link at Uni Mannheim • 8 GBytes of traffic • Carried out once a day for over a year Versatile Tool: KAD, Overnet, BitTorrent (Azureus), and the Storm Worm

The Full Peer Crawl • Our Approach • Single machine • Main Memory • State less • Un-synchronized queries • Traditional Approach • Cluster of computers • Centralized database • State full client • Synchronized queries • Crucial: Synchronization between the machines

Discover the Peers • Functionality • Query seed peer for contacts using “route requests” • Breadth First Search to explore the full graph • Stop when no new peers are discovered

Diurnal Pattern 21:30 Beijing 21:00 Madrid, Paris, Rome

Total ~ 2500 China ~ 1500 New Peers • About 700,000 new KAD IDs join KAD every day for the first time. 260 million new peers/year THIS IS HARD TO BELIEVE ! New peers: peersseen the first time on day x in one zone

Session Length Weibull distribution provides a very good fit of session length distribution: Predicts the stability of a peer

Crawl Conclusions • China, Europe, rest of the world • Chinese are distinct, are connected for less time • KAD ID aliasing • KAD IDs are not persistent as was assumed before • Peers come back over and over again • mean lifetime greater than 7 months • Core of stable peers with extremely long session times • Up to 78 days • Session times are heavy tailed (Weibull distributed) • Possible to predict the future behavior Developed the (today’s) only crawler for the full KAD network

Content in KAD • What content is shared? • Movies? Music? Legal material? • What keywords are popular? • How much control traffic is generated?

The Content Spy • How to spy on part of the hash space called I ? • Introduce a large number of spy peers that have KAD ids in I • How many spy peers? • Scalability of spy • All the spy peers are running on a single PC • To reduce the memory requirements, no state is kept 00…00 11…11

Spying: Control Traffic • Spied on the 8bit zone <e3> during 12 hours • Search • Messages 561 542 • Traffic 10,8 Mbytes • Publish • Messages 5 549 183 • Traffic 966 MBytes • Route • Messages 9 761 278 • Traffic 342 MBytes x10 x100

Spying: Keyword Popularity in zone <e3> The most popular words are so-called Stop Words

Spy Conclusion • Findings • Interesting methodology for spying • Publish traffic 100 times larger than search traffic • Large content base with more than 80 Mio files • Improvements • Don’t publish stop words • Modify re-publish frequency to increase time until next republish: • Reduces publish traffic by factor of 10

Contributions • Measurement Methodologies • The (today’s) fastest crawler for the KAD network • First to crawl the entire KAD network • Content Spy • Instrumented client • Proposed Improvements • Publishing Overhead • Security • Content Retrieval

An Augmented Delaunay Overlay forDecentralized Virtual Worlds

Local knowledge • Networked Virtual Environment (NVE) based on the Delaunay Triangulation.

Ignoring the physical network • Neighbors in the overlay may be far away in the network topology (and the other way around…)

Goal: minimize the delay penalty • Small World • hops => hops • Topology awareness • Giving priority to nodes close in the physical network Augment the overlay, introduce a second type of neighbor relationship: shortcuts Contribution: Exploring these two approaches together

Shortcuts: How to find? • How to find? • Join procedure • Traveling the virtual world • Message forwarding • Learn from existing shortcuts • How to choose? • Network Proximity Awareness • delay < x ms • Network coordinate system or ping • Complete coverage of the virtual world in order to create a small-world

Shortcuts: How to use? • Greedy Walk • Use shortcuts in priority • Fallback to Delaunay routing • Minimize the remaining distance in the overlay • First using shortcuts • Travel long overlay distances • Travel short underlay distances • Close to the destination, using Delaunay neighbors • Travel short overlay distances • Travel long underlay distances

Simulation setup • Overlay based on the Delaunay Triangulation • Gt-itm Network Topology Generator • 2 Tier • A fraction of nodes with one neighbor are chosen to participate in the overlay • Random assignment between the nodes (underlay) and the peers (overlay)

Results: Intuition Path in the overlay Path in the underlay Shortcuts Withouts Shortcuts

Results: Shortcut coverage Random distribution Clustered distribution

Results: Delay Polynomial -> logarithmic increase

Results: Delay distribution

Conclusion • Nodes that are close in the underlay may be far away in the overlay • Reducing average number of hops and delay by augmenting the overlay in very simple way • Short in the underlay • Long in the overlay • Approach is not limited to Delaunay based overlays

Contributions • Distributed algorithms for the construction and maintenance of a peer-to-peer network based on a Delaunay Triangulation (in n-dimensional spaces) • Dynamic and distributed clustering of peers • Augmenting the triangulation with (a few) shortcuts to reduce the delay penalty

Thank You for Your Attention! Questions!

Analysis and Design of Algorithms for Peer-to-Peer Networks

Analysis and Design of Algorithms for Peer-to-Peer Networks

Presentation Transcript

Peer-To-Peer Networks

Peer-to-Peer Networks

Peer to Peer Networks and Security

Social Networks and Peer to Peer

Peer-to-Peer and Social Networks

Peer-to-Peer Networks

Peer-to-Peer and Social Networks

Peer-to-peer networks

Peer-to-Peer and Social Networks

Peer-to-Peer and Social Networks

Peer to Peer Networks

Peer-to-Peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Peer-to-Peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Peer to Peer Networks and Security

Peer-to-Peer Search Algorithms