220 likes | 227 Views
Explore GIA properties for scalable decentralized networks like Gnutella: dynamic topology adaptation, active flow control, one-hop replication, and biased random walk search mechanism.
E N D
Making Gnutella-like P2P Systems Scalable (Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, Scott Shenker) P2P Group Meeting (ICS/FORTH)Monday, 21 February, 2005
GIA* A collection of properties which aims to increase scalability and system capacity in decentralized P2P networks, similar to Gnutella. *Gia is a short for giandiuia, which is the generic name for the hazelnut spread, Nutella. What is all about?
Dynamic topology adaptation algorithm Active flow control, based on tokens One-hop replication of pointers to content Search mechanism based on biased random walks GIA Properties
An algorithm that creates a specific topology schema for peer connections. The final goal is to build a network topology which ensures that high-capacity peers have high degrees (more stable nature in the system) and low-capacity peers are within short reach of high-capacity ones. Capacity is transmitted during the handshake phase and in PONG packets. Capacity is based only on a peer's available bandwidth and on a peer's life-time in the network. Capacity is normalized by the peer's degree value. (1) Topology Adaptation
Let S∈[0,1] to be a peer's satisfactory level. S grows linear from 0 (dissatisfied) to 1 (fully satisfied). Each peer tries to increase its S by picking peers from its cache and connecting to them. Let X, Y, Z to be abstract peers in the network. X picks to connect to Y which optimally has higher capacity than X. Y welcomes X if Y's neighbors are less than max_nbrs-H (H=5), otherwise Y is free to decide. X may drop an already established connection in favor of Y. X always drops the highest-degree neighbor Z, the one that has less to loose from X. Z is dropped only if Y has at least H fewer neighbors than Z. How it works?
Consider X, Y, Z peers. Y, Z have C capacity. Assume Dy, Dz are Y's and Z's degrees respectively. Denote that Dz < Dy. Z has higher contribution to X's S than Y. A peer with capacity C will forward approximately C queries per unit time at full load and needs enough outgoing capacity from all of its neighbors to handle the load. Capacity-Degree Contribution
A topology that forces low-capacity peers to be in short range of high-capacity (with high degrees) ones optimizes random-walk based search. High-capacity peers are better candidates to handle successfully a query, since they have the needed bandwidth. Do not confuse Ultrapeers with Gia's Topology Adaptation algorithm. The latter is not binary! Why?
Gia's peers periodically transmit query-acceptance tokens. X peer can forward a search query in Y peer only if X has received a token from Y. Gia's flow control eliminates query packet dropping, which is vital because it uses random-walk based search. Token assignment is relevant to each peer's capacity. A Start-time Fair Queuing (SFQ) implementation is used. (2) Flow Control
Each peer maintains pointers (exchanged during the establishment of a new connection) of its neighbors' content. Special care must be taken in order the pointers to be always updated. Some thoughts exist to extend this property in order to have one-hop physical content's replication. (3) One-hop Replication
Gia's search protocol is a biased random-walk: forward a query to the highest capacity peer for which you have flow-control tokens available. Each query packet has a MAX_RESPONSES field, with a TTL-compliant behavior. Returned Query Hits are assumed as implicit keep-alive of the query. An empty Query Hit packet is returned explicitly when the query reach the final peer. (4) Search Protocol
Gia is compared with: FLOOD: Original Gnutella model. RWRT: Searching using random-walks over random topologies. SUPER: Searching in Ultrapeer compliant networks. Simulation
Based on a Gnutella report (Saroiu et al.). Model – Capacity Distribution
Let Ci to be the peer's capacity representing the number of messages that it can process per unit time. Let qi to be: the number of queries that peer i generates per unit time. All peers have the same q bounded by their capacity. Assume infinite length incoming/outgoing buffer queues. Model – Query rate
Queries are keyword based. Constant replication factor (rf of 1% implies that a query produces a hit in 1% peers of the system) in each simulation run. Topology/TTL Gia: Initial random graph but with topology adaptation. TTL=1024. min_nbrs = 3,max_nbrs = min(max_nbrs, Capacity/min_alloc), min_alloc = 4, max_nbrs = 128. RWRT: Random graph. TTL=1024. Average degree = 8. FLOOD: Random graph. TTL=10. Average degree = 8. SUPER: Random graph for supernodes. Ordinary nodes connect randomly to one supernode. TTL=10. Model – Other properties
Success Rate: the fraction of queries issued that successfully located the desired content. Hop-count: the number of hops required to locate the desired content. Delay: the time taken by a query from start to finish. Please, see figure 2. Final Metrics Collapse Point (CP): the per node query rate at the knee, which we define as the point beyond which the success rate drops below 90%. This metric reflects total system capacity. Hop-count before collapse (CP-HC): the average hop-count prior to collapse. Ideal Case: a system with high success rate and low hop-count/delay. There is no delay metric, since delay is effectively captured by the collapse point. Performance Metrics
Single Search (i.e. MAX_RESPONSES = 1) Please, see Figure 3, 4. System Capacity in Gia is 3 to 5 orders of magnitude higher than FLOOD/RWRT. Gia copes better than SUPER. Performance Comparison
Multiple Search Results (i.e. MAX_RESPONSES >= 1) Please, see Table 2, 3. A query for k MAX_RESPONSES at a replication factor of r is equivalent to a query for a single response at replication factor of r/k. Performance Comparison
Please, see Table 4.Each property has no linear contribution to the performance achieved by the combination. Removing the OHR property from Gia drops the CP manifestly, but it does not so while adding the property in RWRT. Removing/Adding Components
Heterogeneity is vital for Gia. Please, see Table 5. Let MAXLIFETIME = 10 time units. I.e. 20% of the peers reset in every time unit (a typical Gnutella peer has life time of 60 minutes). Performance drops by less than order of magnitude even in high churn rates. Please, see Figure 5, 6. Other Performance Issues
Let T to be the maximum interval between adaption operations. Let K to be an integer representing aggressiveness of topology adaptation. Adaption Interval: I = T×K-(1-S). If S < 1.0 then topology adaption is performed every I seconds. If S = 1.0 then a check for S is performed every T seconds. Please, see Figure 8, 9. For the PlanetLab experiment T = 10 secs. Technical Details
☺ Most of the extensions have algorithmic nature. ☺ Smart use of the ad hoc heterogeneity of open decentralized networks. ☹ Some parts of the extensions need packet modification (MAX_RESPONSES, capacity/degree transmition). ☹ Topology adaptation might be a heavy operation in the real world (thousands of peers, real TCP/IP handshakes, etc.). ☹ Everyone must play by the rules. ☹ Possible race during the handshake phase. Remarks, IMHO
Elias Athanasopoulos elathan@ics.forth.gr http://null.edunet.uoa.gr/~elathan/ ((TTL % 2) == 0) ? broadcast() : rndwalk(); Thank you for your time! :-)