410 likes | 513 Views
A Step Back Reflections on P2P Techniques. Indranil Gupta March 16, 2006 CS 598IG. SP06. Let’s keep it Short Today. 2 P2P or Not to P2P Scooped, Again. 2 P2P or Not 2 P2P?. Mema Roussopoulos Mary Baker David S. H. Rosenthal TJ Giuli Petros Maniatis Jeff Mogul. Kerry.
E N D
A Step BackReflections on P2P Techniques Indranil Gupta March 16, 2006 CS 598IG. SP06.
Let’s keep it Short Today • 2 P2P or Not to P2P • Scooped, Again
2 P2P or Not 2 P2P? Mema Roussopoulos Mary Baker David S. H. Rosenthal TJ Giuli Petros Maniatis Jeff Mogul
Kerry Candidate problems • Internet Routing (RON) • Resource Sharing (PlanetLab) • Cooperative Web Caching • Internet Backup and Corporate Backup • Distributed Digital Libraries • Distributed Monitoring • Ad hoc Routing in Disaster Recovery • Metropolitan-area Cell Phone Forwarding
Ideal P2P properties • Self Organizing • P2P routing • Discovery • Symmetric communication • Peers are approximately equal • Decentralized control • No single point of failure
P2P Networks Gnutella Usenet Images from http://www.cybergeography.org/atlas/more_topology.html
2 P2P or not P2P Budget Relevance Trust
Budget Low Effect High • Lowest possible cost per peer, rather than lowest global cost • Bittorrent, Gnutella, Freenet, etc. • SETI@home • Dictates how many peers join • Decides if P2P is viable for problem • Worries less about performance criticality • Favors centralized approaches, P2P irrelevant • Clusters, High performance computing
Relevance Low Effect High • Personal data • Private data • Internet backup • Corporate backup • Web caching • Relevance of resources encourages peers to join • “When resource relevance is high, cooperation in a P2P solution evolves naturally” • File sharing • Freenet • Content distribution • Internet routing • Bit Torrent • Gnutella • Kazaa
Trust Low Effect High • Encryption • Anonymity • Freenet • Oceanstore • Ivy • Timestamping • MojoNation • Mutual trust • Risks • Gnutella • Napster • Overlays • File sharing • Usenet
Rate of Change Low Effect High • Tangler • Freenet • LOCKSS • Time stamping • Content distribution • Usenet • Flash crowds • Churn • Timeliness • Consistency • Internet routing • Online net monitoring
Criticality Low Effect High • Usenet • Content distribution • Offline net study • File sharing • Centralized control • Accountability • Fault tolerance • Ad hoc disaster recovery • Flash crowds • Internet monitoring • Routing
2 P2P or not P2P Budget Relevance Trust
Conclusion • Framework for analyzing P2P applications • Captures constraints and app requirements • Limited budget is motivating factor • Problems with low relevance are inappropriate for P2P • Same as our “Penny Lane” motivation for P2P systems
Critique • Strengths • Quantifies application requirements and suitable use cases • Generically describes suitability of classes of P2P apps • Weaknesses • High churn: “p2p inappropriate”? Or most current non-Kelips solutions insufficient? • Why can’t p2p systems handle critical applications? It’s a question of developing the right, e.g., real-time, technologies. • Why is the order of preference budget > relevance > trust > churn > criticality? Why not a different ordering? • Fuzzy requirements not accounted for • Other requirements – will they evolve as new p2p applications emerge?
Scooped, Again Jonathan Ledlie Jeff Shneidman Margo Seltzer John Huth
Outline • Introduction • Grid Computing • P2P Systems • Fallacies preventing cooperation • Shared and Disjoint Problems • Conclusions What they are, Goals, Manifestations
Introduction • Peer-to-Peer vs. Grid Computing • Overlapping problem domain • P2P focuses on research • Grid is concerned with concrete, tangible solutions • History, repeated – the Web
Introduction – cont. • Current trends • Divergent, parallel development • Duplication of work • Grid: risk of non-optimal solutions • Missing out on P2P’s strong achievements (search and storage scalability, decentralization, anonymity, denial of service prevention) • Cooperation is the key
Grids • What is the Grid? “a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across multiple administrative domains based on the resources’ availability, capability, performance, cost, and user’s QoS requirements” • Short version: virtualizing computer resources • Large scale heterogeneous resource sharing (different platforms, hardware/software architectures, and computer languages) • Functional classification: • Computational grids (run batch jobs during idle times) • Data grids
Grid Goals • Design goal: • Solve problems too big for a single supercomputer, but retain the flexibility to work on multiple smaller problems • Self-configuring, self-tuning, self-healing • Allow data sharing and support computation across administrative domains • Standardized programming interface • GGF (Global Grid Forum) • Globus toolkit – the de facto standard for grid middleware
Grid Manifestations • Protocols: • Resource management: • Grid Resource Allocation & Management Protocol (GRAM) • Information services: • Monitoring and Discovery Service (MDS) • Security services: • Grid Security Infrastructure (GSI) • Data movement and management: • Global Access to Secondary Storage (GASS), GridFTP • Tools: • Grid Portal Software (GridPort, OGCE) • Grid Packaging Toolkit • Grid-enabled MPI (MPICH-G2) • Network Weather Service • Condor (CPU cycle scavenging) and Condor-G (job submission) • APIs: • Web Services: Open Grid Services Architecture (OGSA)
P2P • What is P2P? “…a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet” • Decentralized, non-hierarchical node organization • Inherently untrusted (well…)
P2P Goals • Cost sharing / reduction • Every peer responsible for its own cost • Reduction of file storage costs • Reduction of computation costs • Improved scalability / reliability • Lack of centralization allows new algorithms (CAN, Chord…etc) to be designed to allow improved scalability • Resource Aggregation • Every peer lends its own resources to the network • Increased Autonomy • Tasks are performed locally – no central service provider
P2P Goals – cont. • Anonymity / Privacy • FreeNet • Dynamism • Nodes enter and leave the system in a transparent way • Ad-hoc communication • Members can join and leave based on their physical location or interests
Grids Parallel, distributed systems concerned with resource sharing, selection, aggregation Resource availability, capability, performance, cost, and user QoS requirements are considered Self-configuring, self-tuning, self-healing Idle cycle and storage utilization P2P Distributed systems that take advantage of resources scattered throughout the Internet Decentralized, non-hierarchical node organization Concerned with fault-tolerance, scalability, availability…etc. Idle cycle and storage utilization Summary
Grid Distributed computation distributed.net SETI@home Data production / aggregation P2P Distributed file sharing Gnutella, KaZaA Distributed computation distributed.net Anonymity Freenet, Publius Summary – cont.
Outline • Introduction • Grid Computing • P2P Systems • Fallacies preventing cooperation • Shared and Disjoint Problems • Conclusions What they are, Goals, Manifestations
Fallacies preventing cooperation • “The technical problems in Grid systems are different from those in p2p systems” • Usage misconception: Grid for computing problems, P2P for file sharing • Data handling and data production in Grid systems has become important • P2P used in desktop collaboration and network computation • “open problems” in both camps have striking similarities
Fallacies preventing cooperation • “While the technical problems are similar, the architectures (physical topology, bandwidth availability and use, trust model, etc.) demand that the specific solutions be fundamentally different” • Solving common problems through sharing good ideas from each community • Application dependent – special requirements tailored to application needs, however the technical approaches for solving a particular problem could benefit both communities
Fallacies preventing cooperation • “Grid projects do not have the flexibility to try new algorithms/ideas because they have to get real work done. P2P research is all about this flexibility” • Grid has room for flexible research, too • Testing new applications and protocols • Users willing to adopt different technologies to get the work done
Outline • Introduction • Grid Computing • P2P Systems • Fallacies preventing cooperation • Shared and Disjoint Problems • Conclusions What they are, Goals, Manifestations
Shared problems • Topology Formation • Node join and neighbor discovery • Work has been done by both groups: • Grid: “On fully decentralized resource discovery in grid environments” • P2P: “Self-organization in p2p systems” • Grid infrastructure in not flexible – hard coded • Could benefit from P2P research prototypes
Shared problems – cont. • Utilization • Resource discovery, data retrieval • P2P hash-based look-up schemes are useful • Resource management / optimization • How to “best” utilize resources in a network • Data replication/caching examined by both communities • Scheduling and handling of contention • P2P focus: bandwidth usage (e.g. Gnutella) • Grid focus: scheduling • Load balancing: break large tasks into distributed smaller ones
Shared problems – cont. • Coping with Failure • P2P: lossy storage model (Freenet, Gnutella) • Considerations for Grid adaptability: • Different common loss model • Storage size (O(petabyte/month)) • Security-related issues • Authenticity: verification of data/computation • Availability: resilience to DoS attacks • Authorization: ACLs
Shared problems – cont. • Maintenance • P2P: essentially no standards or APIs • Efforts by Berkeley BOINC, Google Compute, overlay standardization • Grid: pushes for a standardized API • GGF (Global Grid Forum) • OGSA (Open Grid Services Architecture) • Web services oriented API – Globus as reference implementation
Disjoint Problems • Anonymity • Not really useful for Grid systems, yet
Conclusions • A lot of overlap between the goals and research interests of the two communities • P2P community needs to consider the needs of the Grid users to see how existing research can be applied successfully to Grid problems • Aim for common standards as much as possible
Critique • Since this paper was published (2003), a little bit of convergence has happened, but not as much as predicted by these authors and as predicted by Foster et al • Will it just take more time? • (Skeptics’ Viewpoint) Really? Aren’t P2P and Grid two different areas? • They still have mostly-disjoint research communities • Or is that an opportunity for more researrch?
Have a good Break! Remember – Midterm report (with initial experimental data) is due April 2!