240 likes | 253 Views
This survey paper provides a high-level view of various aspects of Peer-to-Peer (P2P) systems, including distributed hash tables. It discusses the history of P2P systems, major applications, system properties, typologies, challenges, and management issues.
E N D
Peer-to-Peer Systems Rodrigo Rodrigues Peter Druschel Max Planck Institute for Software Systems
Paper Overview • Survey paper • Long list of references • Presents a high view of various aspects of P2P systems • Describes distributed hash tables in more detail
In the beginning • 1999 • Napster music sharing system • Gave a bad reputation to P2P systems • Freenet anonymous data store • SETI@home volunteer-based distributed computational project
Now • BitTorrent • Skype P2P telephony system • Skinkers enterprise communication management system • P2PLive, CoolStreaming, BBC’s iPlayer
What I think • Previous list mentioned commercial products • Does it mean all major issues have been solved?
Email from Skype To our valued customers:As 2010 draws to a close, I would like to take a moment to thank each of you for your patience, understanding, and support during Skype’s recent outage. … Kind regards, Tony BatesCEOSkype
Defining propertiesof P2P systems • High degree of decentralization • Few or no dedicated central nodes • Multiple administrative domains • Low barriers to deployment • Organic growth • Resilience to faults and attacks • Abundance and diversity of resources
What I think • P2P systems are • Very cheap • Very easy to deploy • Highly scalable
Applications (I) • Sharing and distributing files: • Napster (quickly shutdown) • Gnutella, FastTrack aka Kazaa(all decentralized) • eDonkey, BitTorrent (faster) • Streaming media: • PPLive, Coolstreaming (academia) • BBC’s iPlayer, Skinkers Livestation (industry)
Applications (II) • Telephony: • Skype • Scientific Computing: • SETI@home • BOINC • Other: • Distributed storage systems (Freenet) • Content-delivery networks(CoralCDN, CoDeeN)
Typology (I) • Degree of centralization: • Partly decentralized: • BitTorrent tracker, • Skype billing subsystem • Fully decentralized: • More scalable • Resilient to failure, attacks and legal challenges • Can have supernodes
Typology (II) • Overlay maintenance • Overlay is graph G = (N, E) describing set of links E among members of set N of participating nodes • If there is a link in E between two nodes, they are aware of each other • Overlays can be structured or unstructured
Unstructured overlays • When a node joins, it acquires a set of "neighbors" by • Contacting the tracker (BitTorrent) • Contacting a system participant • Must have a mechanism advertising these nodes
Structured overlays (I) • Use key-based routing • Each node has a unique identifier • 160-bit integer • Identifiers are uniformly distributed • Addressing is based on keys • Each key is mapped into exactly one of the current overlay nodes • Smallest integer "larger" than key value: make identifier space circular
Structured overlays (II) • Key-based routing implements primitiveKBR(no, k) that produces a path going from a node no to the node holding key k • Big tradeoff is between • Keeping paths short • Minimizing state information kept by nodes
Typology (III) • Distributed state • In partly decentralized systems state is maintained by • The central node(s) • The peers assigned by it/them to each node • In decentralized systems, state is kept by • The content providers • Individual peers
Locating data • In unstructured systems, nodes wanting to access a specific object flood their neighbors, which flood their neighbors and so on • Structured systems use distributed hash tables • All data have keys • Stored at node responsible for key value and replicated at its successors
Typology (IV) • Distributed control: • In unstructured systems , it is typically done by epidemic techniques • Can also build a spanning tree among the nodes if membership is fairly stable • In structured systems, it is much easier to build spanning trees
Content distribution • Tree-based protocols • Main disadvantage is that leave nodes ado not contribute anything • Full binary tree of height n has 2n+1- 1 nodes and 2n leaves • Swarm-based protocols • BitTorrent • All nodes can participate
Challenges (I) • Controlling membership: • Preventing Sybil attacks • One node pretending to be many • Can require proof of work or use trusted identities (FARSITE)
Challenges (II) • Protecting data: • Integrity and Authenticity: • Can use digital signatures • Data stored in DHTs can be self-certifying by making DHT keys function of data themselves • Can use voting (LOCKSS) • Availability and Durability • Replicate data and keep system alive
Challenges (III) • Incentives: • Fighting Free riding • Big problem • BitTorrent tit-for-tat • Not always feasible • Managing P2P Systems: • Lack of centralized control can make system hard to manage • Skype collapses
P2P and ISPs • P2P systems consume a lot of bandwidth • Current ISP billing models assume that customers send much less bits than they receive • Flat-rate pricing for residential customers • Bandwidth-based pricing for information providers • ISPs have no way to bill anyone for P2P traffic
Conclusions • P2P is a disruptive technology with great potential • Major strength is lack of centralized control • Also creates new challenges that can be • Technical • Commercial • Legal