1.07k likes | 1.08k Views
This overview of peer-to-peer (P2P) computing explores its evolution, definitions, applications, open problems, and the rise of edge computing. Learn about P2P networks, their advantages, and the challenges they pose. Discover the potential of P2P for resource sharing, content delivery, and more.
E N D
P2P: An Overview Dr. Tony White Carleton University
Outline • Introduction • Evolution of Network Computing • Definitions • The Rise of Edge Computing • Why Peer-to-Peer? What is it? • Applications • Cycle Sharing • Content Delivery • … • Open Problems • Summary
Evolution of Network Computing • Web introduced: • - A common protocol: HTTP • - A common document format: HTML • - A universal client: the browser • Client/server: • - Introduced inequalities • - Required homogeneity
P2P Definition • Peer-to-peer computing is the location and sharing of computer resources and services by direct exchange between servents. • A servent is a peer that can adopt the roles of both server and client when operating.
P2P Definition “P2P is a class of applications that takes advantage of resources -- storage, cycles, content, human presence -- available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system and have significant or total autonomy from central servers.” Clay Shirkey February, 2000
Definitions I • Pure peer-to-peer is completely decentralized and characterized by lack of a central server or central entity; clients make direct contact with one another. • Computational peer-to-peer uses P2P technology to disseminate computational tasks over multiple clients; peers do not have a direct connection to one another.
Definitions II • Datacentric peer-to-peer is information and data residing on systems or devices that is accessible to others when users connect. It is sometimes called peer-assisted or grid-assisted delivery. Applications include distributed file and content sharing. • Usercentric/hybrid peer-to-peer involves clients contacting others via a central server or entity to communicate, share data, or process data. Often used in collaboration applications.
What is a P2P network? • It is an overlay network • Peer applications know IP addresses of other peer applications. • Link between two nodes is actually an application-level connection.
What matters? • Topology of overlay matters • Where content is stored matters • Search protocol matters • Gnutella results in: • Poor performance • Poor reliability
The Rise of Edge Computing … • In P2P, clients also are servers, hence are peers. • Driving P2P is the abundance of: • Computing power • Non-volatile storage • Network bandwidth • (This seems to turn thin clients on their heads.) • Sharing from the edge: • Physical Resources: cycles, disk • Information Resources: files, database access • Services: code mobility implied
P2P Enables Complete Access • P2P file swapping is the obvious application • Text, audio, video, executables, … • Searching and sharing • Resources • Information • Information processing capacity • Searches • More current than Google™ • Indexing web logs (blogs, klogs …) • More focused: search within a “peer group”
P2P Enables Complete Access … • Searching and sharing: • Instant messaging • locate user quickly independent of service provider. • Buyers and sellers • P2P auctions – compete with Ebay. • Blogging • Sharing of “self”. • Edge-based multi-media streaming: • Web radio • Web TV • Peer shells: • Script complex P2P applications from simpler ones. • Service creation using service composition.
P2P Enables Complete Access … • A New Style of Distributed Computing • P2P applications tolerate peers coming/going. • Result depends on which peers are available. • High availability comes from probability that some peers are available. • Not on load-balancing and fail over schemes. • Must avoid “tragedy of the commons”.
Examples of Early P2P • Some new Internet applications are different: • SETI@home • Instant messaging services (AIM, MS Messenger, …) • P2P applications – no central authority/server. • Napster – quasi-P2P • Gnutella • Freenet • These applications are vertically integrated: • Non-standard protocols • Closed namespaces • Stand alone
Problems I • Topology • Bandwidth usage • Fault tolerance • Search efficiency • Identity • Trust • Anonymity • Security • Authorization • Privacy
Problems II • Namespaces • Community Management • Overlaps traditional enterprise groups • Highly dynamic, user controlled • Firewall traversal • Political • IT loses control of content distribution • No control of information flow! • Legal • DRM
What is needed? • Interoperability (common protocols & standards): • Communication protocols (e.g. JXTA, Jabber, …) • Representation of identity (or not!) • Semantic content (meta-data) • Secure information exchange: • Must be able to guarantee trust within a network • Prevent unauthorized access to network • Policy-based control of information exchange • Ubiquity • Buy-in from large groups of users
Securing Distributed Computationsin a Commercial Environment Philippe Golle, Stanford University Stuart Stubblebine, CertCo
Example of a Distributed Computation • 580,000 active participants • 565,800 years of CPU time since 1996 • 26.1 TeraFLOPs / sec
Commercialization: supply • A dozen of companies have recruited thousands of participants • $100 million in venture funding in 2000 www.mithral.com www.dcypher.net (with www.processtree.com) www.distributed.net www.entropia.com www.parabon.com www.uniteddevices.com www.popularpower.com www.distributedsciences.com www.datasynapse.com www.juno.com
Commercialization: demand • Super-computing market: $2 billion / year • Computationally intensive parallelizable projects: • Drug design research • Mathematical research • Economic simulations • Digital entertainment
Cheaters! "Fifty percent of the project's resources have been spent dealing with security problems" “The really hard part has to do with verifying computational results" David Anderson, Seti@home's director.
Cycle Sharing Participants • Trusted supervisor • Maintains a pool of registered participants • Bids for large computations • Divides the computation into tasks that are assigned to participants • Collects the results and distributes payment to the participants • Example: Distributed.net, Entropia.com, etc… • Untrusted participants • May range from large companies to individual users • Participants are anonymous (No “real world” leverage) • Participants may collude. We distinguish between real-world entities (agents) and anonymous participants. • Participants may leave the computation at any time, either temporarily or for good.
Organization • Distribution of tasks • The unit of computation is a task • Assumption: all tasks have the same size and can be run by any participant within the same time bounds. • The supervisor runs a probabilistic algorithm to assign tasks to participants. • The supervisor keeps track of who did what
Security • Definition: a computation is secure if no rational, non-risk-seeking participant ever cheats. • Collusion may occur only before tasks are assigned. • A participant has 3 choices: • Request a computation and do it • Request a computation and NOT do it • Take a leave • Assumption: all errors are malicious
Utility function of an agent • Security condition: (α+E)P – L(1-P) < 0 where P is the probability that cheating is undetected α Run the computation Cheat and “guess” the result Cheating detected Cheating undetected α: Payment received per task E: Benefit of defecting (E = e α) L: Cost of getting caught cheating
Basic scheme • Registration: • Participant performs d+1 unpaid tasks • The supervisor verifies them (at limited cost) • The participant is accepted iff all the results are correct • Assignment of a task: • A task is given to N participants chosen uniformly independently at random • The number N is chosen according to the probability distribution • Payment: a constant amount α per task if all the results agree • If not, the task is re-assigned to a new set of participants • Severance: a participant is paid an amount d.α
Properties • Computational overhead = (α+E)P – L(1-P) < 0 • Security condition: • Overhead = for “small” p
Content Delivery Networks • Swarmcast/OnionNetworks • File is stored in multiple locations • Idea is to retrieve portions of file from separate hosts: • File is split into small (32k) pieces • Requests are random • Space of packets bigger than file • Only subset of packets required • Technique is Forward Error Correction • Kazaa/Morpheus • MojoNation (HiveCache) • Distributed backup and restore system
Privacy Networks: Publius • Publius • Publishers: want to publish anonymously • Servers: host random-looking content • Storage • The publisher takes the key, K that is used to encrypt the file and splits it into n shares, such that any k of them can reproduce the original K, but k-1 give no hints as to the key. • Each server receives the encrypted Publius content and one of the shares. • Retrieval • A retriever must get the encrypted Publius content from some server and k of the shares. • Content is tied to URL that is used to recover the data and the shares.
Privacy Networks: Freehaven • Anonymity: • Publishers that insert documents, • Readers that retrieve documents, • Servers that store documents. • Uses a free, low-latency, two-way mixnet for forward-anonymous communication. • Accountability: • Reputation and micropayment schemes, which allow us to limit the damage done by servers that misbehave. • Persistence: • Publisher of a document determines its lifetime. • Flexibility: • System functions smoothly as peers dynamically join or leave
OceanStoreToward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley
OceanStore Context: Ubiquitous Computing • Computing everywhere: • Desktop, Laptop, Palmtop • Cars, Cellphones • Shoes? Clothing? Walls? • Connectivity everywhere: • Rapid growth of bandwidth in the interior of the net • Broadband to the home and office • Wireless technologies such as CMDA, Satelite, laser • Where is persistent data????
Utility-based Infrastructure? Canadian OceanStore Sprint AT&T IBM Pac Bell IBM • Data service provided by storage federation • Cross-administrative domain • Pay for Service
OceanStore: Everyone’s Data, One Big Utility“The data is just out there” • How many files in the OceanStore? • Assume 1010 people in world • Say 10,000 files/person (very conservative?) • So 1014 files in OceanStore! • If 1 gig files (ok, a stretch), get 1 mole of bytes! Truly impressive number of elements…… but small relative to physical constants Aside: new results: 1.5 Exabytes/year (1.51018)
OceanStore Assumptions • Untrusted Infrastructure: • The OceanStore is comprised of untrusted components • Individual hardware has finite lifetimes • All data encrypted within the infrastructure • Responsible Party: • Some organization (i.e. service provider) guarantees that your data is consistent and durable • Not trusted with content of data, merely its integrity • Mostly Well-Connected: • Data producers and consumers are connected to a high-bandwidth network most of the time • Exploit multicast for quicker consistency when possible • Promiscuous Caching: • Data may be cached anywhere, anytime
Key Observation:Want Automatic Maintenance • Can’t possibly manage billions of servers by hand! • System should automatically: • Adapt to failure • Exclude malicious elements • Repair itself • Incorporate new elements • System should be secure and private • Encryption, authentication • System should preserve data over the long term (accessible for 1000 years): • Geographic distribution of information • New servers added from time to time • Old servers removed from time to time • Everything just works
Attack Resistant P2P • Content can be compromised by: • Attack by malicious agents • Censorship • Faulty nodes • Remember: • Nodes have finite resources
Gnutella query
Morpheus/Kazaa ... ... ... ... ... ... super peer
Examples • Napster shut down by attacks on central server • Gnutella spammed by Flatplanet • Removal of a few peers shatters Gnutella • 63 from 1800 in figures
Performance After deletion of 2/3 of peers, 99% of remainder can still access 99% of the data items
DRN design [Jared Saia] • Topology based upon butterfly network (constant degree version of hypercube) • Each vertex of butterfly called a supernode • Each supernode represents a set of peers • Each peer is in multiple supernodes
DRN Topology N peers, n supernodes Each peer participates in Clogn randomly chosen supernodes Supernode X connected to supernode Y means all nodes in X connected to all nodes in Y
Conclusion • P2P systems popular today • Limewire, Kazaa … • Existing P2P systems vulnerable and inefficient • Many challenges ahead: • Search • Resource Management • Security and Privacy Lots of good research to be done …
Appendix I Open Problems in P2P Data Sharing
Open Problems in Data Sharing Peer-To-Peer Systems Hector Garcia-Molina ICDT Conference, January 10, 2003 Contributors: Mayank Bawa, Brian Cooper, Arturo Crespo, Neil Daswani, Prasanna Ganesan, Sergio Marti, Qi Sun, Beverly Yang and others
P2P Challenges • Search • Resource Management • Security & Privacy not independent challenges!
Search • Search Options • Query Expressiveness • Comprehensiveness • Topology • Data Placement • Message Routing