490 likes | 625 Views
A Survey of Peer-to-Peer Content Distribution Technologies. Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwan Baek Ja-eun Choi. Outline. Overview of P2P P2P Motivation P2P Characteristics & Benefits
E N D
A Survey of Peer-to-Peer Content Distribution Technologies StephanosAndroutsellis-Theotokis and DiomidisSpinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwanBaek Ja-eun Choi
Outline • Overview of P2P • P2P Motivation • P2P Characteristics & Benefits • P2P Application Types • P2P Classification • Unstructured: Gnutella, Kazaa, Napster • Structured: Freenet, Chord, CAN, Tapestry • Other Aspects • Conclusions
P2P Motivation Client/Server Architecture: • Well known, powerful, reliable server is a data source • Clients request data from server • Very successful model WWW (HTTP), FTP, Web services, etc.
P2P Motivation (Cont’d) Client/Server Limitation: • Scalability is hard to achieve • Presents a single point of failure • Requires administration • Unused resources at the network edge P2P systems try to address these limitations
P2P Characteristics P2P Computing: • P2P computing is the sharing of computer resources and services by direct exchange between systems. • These resources and services include the exchange of information, processing cycles, cache storage, and disk storage for files. • P2P computing takes advantage of existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power to the ‘benefit’ of all.
P2P Characteristics (Cont’d) P2P Characteristics: • All nodes are both clients and servers • Provide and consume data • Any node can initiate a connection • No centralized data source Nodes collaborate directly with each other (not through well-known servers) • Network is dynamic Nodes enter and leave the network “frequently”
P2P Benefits • Ease of administration • Nodes self-organize adaptively • No need to deploy servers to satisfy demand (c.f. scalability) • Built-in fault tolerance, replication, and load balancing • Scalability • Consumers of resources also donate resources • Aggregate resources grow naturally with utilization • Reliability • Geographic distribution • No single point of failure
P2P Application Types • Direct real-time communication: instant messaging • Combine processing power of multiple distributed machines to perform complex computations: analysis of SETI data, prime computation • Distributed database systems • Store and distribute digital content: mp3 file sharing (Content Distribution)
P2P Classification Architecture Types: • Unstructured • Structured • Loosely structured Here, By structure, we refer to whether overlay network is created non-deterministically or whether it’s created based on a specific rules
P2P Classification (Cont’d) Data organization Centralization
Unstructured Architectures • Placement of content is unrelated to overlay topology • Search mechanism is required. • Appropriate for case of highly-transient node population Degrees of centralization: • Purely Decentralized • Partially Centralized • Hybrid Decentralized
Purely Decentralized download request registration reply registration reply query query query query query registration registration query registration • Purely Decentralized • No central coordination • Users (servents) connect to each other directly. • Gnutella architecture • Query: Flooding • Send messages to all neighbors • Response: Route back • Scalability Issues • With TTL, virtual horizon • Without TTL, unlimited flooding • E.g., Gnutella, FreeHaven
Partially Centralized registration query reply request download query reply • Partially Centralized • Supernodes • Indexing & caching files of small subpart of the peer network • Peers are automatically elected to become supernodes. • Advantages • Reduced discovery time • Normal nodes will be lightly loaded. • E.g., Kazaa, Edutella, Gnutella (later version)
Hybrid Decentralized resigtration reply query request download • Hybrid Decentralized • Central directory server • User connection info. • File & metadata info. • Advantages • Simple to implement • Locate files quickly and efficiently • Disadvantages • Vulnerable to technical failure • Inherently unscalable • E.g., Napster, Publius
Outline • Overview of P2P • P2P Motivation • P2P Characteristics & Benefits • P2P Application Types • P2P Classification • Unstructured: Gnutella, Kazaa, Napster • Structured: Freenet, Chord, CAN, Tapestry • Other Aspects • Conclusions
Structured Architectures • Features • Mapping of content and location • Scalable solution for exact-match queries • Examples • Freenet • Chord • CAN • Tapestry
Freenet • Loosely Structured System • Chain mode propagation • Each node • Local data store • Dynamic routing table • ( node address, file key ) • Each file • Unique binary key
Freenet (Cont’d) • Messages • Node ID, Timeout, Src ID, Dst ID • Message types • Data insert : key, data • Data request : key • Data reply : file • Data filed : failure location, reason
Freenet (Cont’d) • Data Insert • Calculates a binary key • Sends a data insert message to itself • Receiving a Data Insert message • If not taken • Store the data • Forwards to the closest key’s owner • If taken • Returns the preexisting file
Freenet (Cont’d) • Data Request • Chain mode propagation • Receiving a Data Request • If locally stored • The search stops and the data is forwarded back • If not • Forwards to the closest key’s owner
Freenet (Cont’d) • Data Fail • Timeout (hops-to-live) • Receiving a Data Failed Message • Forwards the request to the next best node • After failed through all neighbors, Sends back data filed message to the request sender
Freenet (Cont’d) • Data Reply • Includes the actual data • Passed back through the chain • The data is cached in all intermediate nodes • A subsequent request w/ the same key → served immediately • A request for a similar key → forwarded to the node that previously provided the data
Freenet (Cont’d) • Indirect Files • A special class of lightweight files • Named according to search keywords • Contain pointers to the real file • Multiple files w/ the same key
Freenet (Cont’d) • Indirect Files
Freenet (Cont’d) • Properties • Nodes specialize in searching for similar keys • Nodes store similar keys • Similarity of keys does not reflect similarity of files • Routing does not reflect the underlying network topology
Chord • Nodes and Files are identified by keys • m-bit identifiers • a deterministic hash function • Mapping File ID onto Node ID • Nodes store (key, data item) pairs
Chord (Cont’d) • A Chord Identifier Circle
Chord (Cont’d) • Simple Key Location
Chord (Cont’d) • Scalable Key Location
Chord (Cont’d) • Simple Key Location • Routing Information: Successor pointer • O( n ) • Scalable Key Location • Routing Information: Finger Table • O( logn )
Chord (Cont’d) • Node Joining • Certain keys assigned to its successor are reassigned to it • Node Departing • Keys are reassigned to its successor
Chord (Cont’d) • Node Joining • N26 joins the network
CAN Content Addressable Network • Hash Table • Maps file names to their location • ( key K, value V ) pairs stored • Each node storing a part of the hash table • A “zone”
CAN (Cont’d) • Virtual coordinate space • A zone corresponds to a segment of space • Key K is mapped onto a point P • A deterministic function • ( K, V ) is stored at the node responsible for P
CAN (Cont’d) • Virtual coordinate space
CAN (Cont’d) • Retrieve • Map K to P • Retrieve the value from the node covering P • Routing • Request is routed to the node covering P • Nodes maintain a routing table • Addresses of Nodes holding adjoining zones • Following the straight line path in the space
CAN (Cont’d) • Routing
CAN (Cont’d) • Node Joining • Allocatedits own portion of the space • By splitting the zone of an existing node • Node Departing • Hand over hash table entries to one of its neighbors
Tapestry • Location and Routing Infrastructure • Self Administeration • Fault Tolerance • Stability • By bypassing failed routes and nodes • Plaxton Mesh • Routing mechanism • Location mechanism
Tapestry (Cont’d) • Routing Mechanism • Neighbor Maps • Local routing maps • Incrementally route messages • Multiple levels • Level l → node ID matched w/ l digits • Multiple entries • The number equals to the base of the ID • Pointer to the closest node in the network
Tapestry (Cont’d) • Neighbor Map of Node w/ ID 67493
Tapestry (Cont’d) • Routing Path from 67493 to 34567 • xxxx7 → xxx67 → xx567 → x4567→ 34567
Tapestry (Cont’d) • Location Mechanism • Root node • Provide a guaranteed node from which the object can be located • Assigned when an object is inserted • A globally consistent deterministic algorithm • When inserted • Server node Ns, object O, root node Nr • Message routed to Ns to Nr • (O, Ns) stored along the routing path
Tapestry (Cont’d) • Location Mechanism • Location query • Messages destined for O • Initially routed toward to Nr • Meet a node containing (O, Ns) mapping
Tapestry (Cont’d) • Advantages of Plexton Mesh • Simple fault-handling • Routing by choosing a node w/ a similar suffix • Scalability • w/ the only bottleneck (root nodes) • Limitations • The need for global knowledge • Assigning and identifying root nodes • The vulnerability of the root nodes
Tapestry (Cont’d) • Extending Plaxton mesh’s Design • Plaxton mesh assumes a static node population • Tapestry adapts it to the transient population • Adaptibility • Fault tolerance • Optimizations
Tapestry (Cont’d) • Optimizations • Back-pointers for dynamic node insertion • Flexible concept of distance between nodes • Maintain cached content for failures • Multiple roots to each object • Adapt to environment changes
Other Aspects • Content Caching, Replication and Migration • Security • Provisions for Anonymity • Provisions for Deniability • Incentive Mechanisms and Accountability • Resource Management Capability • Semantic Grouping of Information
Conclusions • Study of P2P Content Distribution Systems • Properties • Design features • Location and routing algorithms • Two Categories • Unstructured system • Structured system • Remains Open Research Problems