250 likes | 346 Views
Seminar: Information Management in the Web. Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn. Peer-to-Peer - Introduction. "opposite" of Client/Server no central servers information highly distributed every peer acts as a client AND server
E N D
Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn
Peer-to-Peer - Introduction • "opposite" of Client/Server • no central servers information highly distributed • every peer acts as a client AND server • -> can query, reply to queries and route messages at the same time • every peer can directly "talk" to any other peer
Popular Peer-to-Peer Networks • Napster • Gnutella • Freenet • FastTrack (Kazaa) • CHORD, CAN, PASTRY, TAPESTRY
Napster • was used primarily for file sharing • NOT a pure peer-to-peer network • => hybrid system • peer turns to central DB for querying (client/server) • peer downloads directly from other peer(s) (peer-to-peer)
Napster 5 4 6 central DB 3 3. Download Request 2. Response 1. Query 4. File 1 2 Peer
Gnutella - overview • pure peer-to-peer • used for file sharing • very popular => practically proven ? • very simple protocol • no routing "intelligence" • messages are always broadcast
Gnutella - PING/PONG 3 6 Ping 1 Ping 1 Pong 3 Pong 6 Pong 6,7,8 Pong 6,7,8 Ping 1 7 Pong 3,4,5 Pong 5 5 1 2 Pong 7 Ping 1 Ping 1 Ping 1 Pong 2 Known Hosts: 2 Pong 8 Pong 4 8 Ping 1 3,4,5 6,7,8 Query/Response analogous 4
Gnutella - Pro & Con • VERY simple protocol => easy to implement • very little overhead • practically proven functionality (?) • message broadcasts flood network • =>heavy network traffic • => bad, bad scalibility
Gnutella – Generated Traffic in Bytes (1) • query message length: 83 bytes • simple query relaying (no responses)
Gnutella – Generated Traffic in Bytes (2) • Mean percentage of users who typically share content: 30% • Mean perctg. of users who typically have responses to search queries: 40% • Mean number of search responses the typical respondent offers: 10 • Mean length of search responses the typical respondent offers: 60 • "Standard client settings yield a whopping 17MB generated in response to […] search query "
Freenet - Concepts • peer-to-peer file storage & retrieval system • every document has a globally unique ID • efficient (?) retrieval algorithm • documents are retrieved with sublinear effort • routing based on likelihood of answer capability • focus on security
Freenet – Query Routing (1) • every peer maintains routing table • table contains known peers along with the IDs of the documents their are storing • a request is routed to the peer most likely to have an answer (closest matching ID) • responses are sent back upstream • intermediate peers also store document and augment their routing tables
Freenet – Query Routing (2) Routing Table B: 14, 20 Doc Cache 19, 30 2. Forward to best match Routing Table C: 19, 30 D: 17, 45, 51 Doc Cache 14, 17, 20 Routing Table C: 19, 30 D: 45, 51 Doc Cache 14,20 C Routing Table B: 14, 20 X: 47, 60 Doc Cache 5, 89 Routing Table B: 14, 17, 20 X: 47, 60 Doc Cache 5, 17, 89 1. Query for doc 17 3. C has no match -> backtrack A B 4. Forward query to 2nd best match 6. Route back response 5. Send back doc 17 Routing Table B: 14, 20 Z: 105, 110 Doc Cache 17, 45, 51, 102, 205 D
Freenet – Document Insert • analogous to query routing • insert is routed to the peer most likely to be interested in new doc (closest matching ID) • intermediate peers cache document and augment routing tables • until TTL is reached
Freenet - Discussion • efficient routing algorithm (compared to Gnutella) • adequate security features/heuristics (the more popular a document, the more frequently it gets cached) • no metasearch • no updates, deletes possible • worst case query routing = DFS
FUtella – Concepts • peer-to-peer platform for general knowledge sharing • tries to model learning style of humans • content-based routing • combines and extends approaches from: • Gnutella (message format) • JXTA (peer groups) • JXTA Search (queryspaces and registrations) • FreeNet (routing of registration discoveries)
FUtella - Knowledge Groups FUtella Net Knowledge Group: Queryspace "Computer Architecture" Group Head: Peer E Inserts Registration E . . . M1 Mi Members M1 - Mi
FUtella - Knowledge Group Discovery 1 Routing Table "computer analysis" -> C "computer systems" -> D "data base" -> A Registration Cache "computer analysis" : Y "computer systems": Z "data base" : X Routing Table "computer" -> B "computer analysis" -> Y Registration Cache "computer": B "computer analysis": Y C 1. Discovery request "computer architecture" 2. Forward discovery request 3. C has no cached registration for "computer architecture -> backtrack A B Routing Table "computer" -> B "computer systems" -> Z "computer architecture" -> E Registration Cache "computer systems": Z "computer": B "computer architecture": E D Routing Table "computer" -> B "data base" -> X Registration Cache "computer": B "data base": X 4. Forward discovery request to 2nd best match
FUtella - Knowledge Group Discovery 2 6. Forward discovery response 5. Discovery response A B D Containing registration "computer architecture": E Routing Table "computer analysis" -> C "computer architecture" -> D "computer systems" -> D "data base" -> A Registration Cache "computer analysis" : Y "computer architecture": E "computer systems": Z "data base" : X Routing Table "computer" -> B "computer systems" -> Z "computer architecture" -> E Registration Cache "computer systems": Z "computer": B "computer architecture": E Routing Table "computer" -> B "computer architecture" -> D "data base" -> X Registration Cache "computer": B "computer architecture": E "data base": X
Futella - Query Processing 1. Discovery request "computer architecture" 2. Forward discovery request C 3. C has no cached registration for "computer architecture -> backtrack A B 4. Forward discovery request to 2nd best match 6. Forward discovery response D 7. Send query 5. Discovery response containing cached registration 8. Forward query to member E M1 9. Query response 8.Forward query to member Knowledge group "computer architecture" . . . 9. Query response Mi
Total Number of Messages 250000 200000 threshold 2 150000 no threshold # msg Gnutella 100000 50000 0 semi-dynamic static peers dynamic peers peers Futella - Test Results (1)
Conclusion • first and second generation P2P systems still most widely used • practically proven • very flexible in terms of topology • bad scalibility (Gnutella) • no guaranteed lower bound on query effort (Freenet) • (scientificly) far better approach: DHTs (see next presentation)