330 likes | 451 Views
On Improving the Performance Dependability of Unstructured P2P Systems via Replication. ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp. PRESENTATION OUTLINE. Introduction Related Work System Overview
E N D
On Improving the PerformanceDependability of Unstructured P2PSystems via Replication ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp
PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work
INTRODUCTION • P2P systems are becoming increasingly popular • A dependable P2P system is the need of the hour • Two perspectives of dependability • system reliability • the availability of the individual peers • system performance • data availability • We define a performance-dependable P2P system as one that the users can rely on for obtaining data files of their interest in real-time. • We focus on improving the performance-dependability of unstructured P2P systems via dynamic replication.
Motivation • Free-riders • A majority of the peers typically download data from a small percentage of peers that offer data • High skews in the initial data distribution • A disproportionately high number of queries need to be answered by a few ‘hot’ peers • Severe load imbalance throughout the system. • Job queues of the ‘hot’ peers keep increasing • Increased waiting times high response times
Motivation • Free-riders • A majority of the peers typically download data from a small percentage of peers that offer data • High skews in the initial data distribution • A disproportionately high number of queries need to be answered by a few ‘hot’ peers • Severe load imbalance throughout the system. • Job queues of the ‘hot’ peers keep increasing • Increased waiting times high response times This decreases the dependability of the system.
The Challenges • Sheer size of P2P networks • Heterogeneity • CPU capacity • Available disk space • Transfer rate of connections • Dynamism of the environment • Peers joining / leaving the system • Hot data becoming cold and vice versa
MAIN CONTRIBUTIONS • A dynamic data placement strategy involving data replication • Objective: to reduce the loads of the overloaded peers • A dynamic query redirection technique • Objective: to reduce response times
PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work
RELATED WORK • Broadcast (Gnutella) • Centralized (Napster) • Routing indices [Crespo2002] • Distributed hash tables • Chord [Stoica2001] • Pastry [Rowstron2001]
RELATED WORK (CONT.) • [Kangasharju2002] • investigates optimal replication of content in P2P systems • adaptive, fully distributed algorithm that dynamically replicates content in a near-optimal manner • [Cohen2002, Lv2002] facilitate search via replication. • Dependability via load-balancing in structured P2P systems (using DHTs) • [Dabek2001] • [Rao2003] • [Triantafillou2003] • divides system into clusters based on semantic categories • discusses dependability via inter-cluster and intra-cluster load-balancing
How this proposal differs from our previous spatial GRID proposal? • Our GRID-related work • Imposes structure on system • Data movement in KB range • Data scattering avoidance • Individual nodes are usually dedicated and expected to be available most of the time. • Main aim is load-balancing • This proposal • No structure imposed • Data movement in MB/GB • Data scattering is ok • Individual nodes may join/leave anytime. • Replication, not load-balancing
PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work
SYSTEM OVERVIEW • Each peer is assigned a globally unique identifier PID • Broadcast-based search • Every peer maintains its own access statistics • Number of accesses made to each of its data files. • List of peers which has downloaded each of its files • Given that very ‘hot’ files may be aggressively downloaded by hundreds of peers very quickly, a peer keeps track only of those peers which have directly downloaded from itself. • Every peer provides a certain amount Spaceof its disk space for replication. • LRU scheme deployed for Space • Periodic deletion of unused replicas • We sacrifice replica consistency for improving query response times.
SYSTEM OVERVIEW (CONT.) • Distance between two peers: communication time between them • Two peers are regarded as neighbours if they are directly connected to each other. • Periodic exchange of status messages between neighbours • Load information • Available disk space information
SYSTEM OVERVIEW (CONT.) • Load of a peer: number of queries waiting in peer’s job queue • Load normalized w.r.t. CPU capacity • Assumptions • Peers know transfer rates between themselves and other peers. • Every peer knows availability information of its neighbouring peers.
PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work
Replication Scheme • Each peer P periodically checks its neighbours’ loads • If P’s load exceeds the average loads of its neighbouring peers by 10%, replication is initiated. • Selection of hot data files • Using recent access statistics information • P sorts its files in desc. order of access frequencies • P traverses this sorted list of data files and selects as ‘hot’ files the top N files whose access frequency exceeds a pre-defined threshold Tfreq. • Number of replicas • For every Nd accesses to D, a new replica is created for D. • Tfreq and Nd are pre-specified at design time.
Criteria for Selection of destination peer Dest for replication • Dest should have a high probability of being online. • PDest should have adequate available disk space. • Load difference with Dest should be significant. • Transfer time TRep with Dest should be minimized. • Dest should be chosen from the peers which have already downloaded that data file. • This makes TRep effectively equal to 0.
Replication Strategy • For each ‘hot’ data file D, the ‘hot’ peer PHot sends a message to each peer which has downloaded D • The peers in which a copy of D exists reply to PHot with their respective load and available disk space • Only the peers with high availability and sufficient available disk space are candidates • Among these candidate peers, PHot first puts the peer MIN with the lowest load into a set Candidate. • Peers whose normalized load difference with MIN is less than δ are also put into Candidate. • δ is a small integer • The peer in Candidate whose available disk space is maximum is selected as the destination peer.
Query Redirection to replicas • What happens when a peer PIssue issues a query Q for a data item D to a ‘hot’ peer PHot? • PHot needs to redirect Q to a peer REDIRECT containing Di’s replica, if any such replica exists. • Objective: To minimize Q’s response time • PHot checks the list of peers having Di’s replica • Selection criteria for query redirection • REDIRECT should be highly available. • Load difference between PHot and REDIRECT should be significant. • Transfer time between REDIRECT and PIssue should be low.
Query Redirection (Cont.) • The ‘hot’ peer PHot first selects a set of peers • which contain a replica of the data file D • whose load difference with itself exceeds TDiff . • TDiff is a parameter which is application-dependent and subjective. • Among these selected peers, the peer with the maximum transfer rate with the query issuing peer PIssue is selected for query redirection.
PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work
Performance Evaluation • Investigates the following • Effect of variations in workload skew • Effect of variations in number of peers • Performance metric: • Average Response Time
PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work
CONCLUSION AND FUTURE WORK • We have proposed a strategy for enhancing the dependability of P2P systems via dynamic replication. • Our strategy takes free-riders into account. • Our performance evaluation demonstrates the effectiveness of our replication-based strategy. • Future Scope of Work • Dealing with very large data items e.g., video files • Cost-effective integration into existing P2P systems • Load-balancing