1 / 25

Seminar: Information Management in the Web

Seminar: Information Management in the Web. Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn. Peer-to-Peer - Introduction. "opposite" of Client/Server no central servers  information highly distributed every peer acts as a client AND server

judson
Download Presentation

Seminar: Information Management in the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn

  2. Peer-to-Peer - Introduction • "opposite" of Client/Server • no central servers  information highly distributed • every peer acts as a client AND server • -> can query, reply to queries and route messages at the same time • every peer can directly "talk" to any other peer

  3. Popular Peer-to-Peer Networks • Napster • Gnutella • Freenet • FastTrack (Kazaa) • CHORD, CAN, PASTRY, TAPESTRY

  4. Napster • was used primarily for file sharing • NOT a pure peer-to-peer network • => hybrid system • peer turns to central DB for querying (client/server) • peer downloads directly from other peer(s) (peer-to-peer)

  5. Napster 5 4 6 central DB 3 3. Download Request 2. Response 1. Query 4. File 1 2 Peer

  6. Gnutella - overview • pure peer-to-peer • used for file sharing • very popular => practically proven ? • very simple protocol • no routing "intelligence" • messages are always broadcast

  7. Gnutella - PING/PONG 3 6 Ping 1 Ping 1 Pong 3 Pong 6 Pong 6,7,8 Pong 6,7,8 Ping 1 7 Pong 3,4,5 Pong 5 5 1 2 Pong 7 Ping 1 Ping 1 Ping 1 Pong 2 Known Hosts: 2 Pong 8 Pong 4 8 Ping 1 3,4,5 6,7,8 Query/Response analogous 4

  8. Gnutella - Pro & Con • VERY simple protocol => easy to implement • very little overhead • practically proven functionality (?) • message broadcasts flood network • =>heavy network traffic • => bad, bad scalibility

  9. Gnutella – Reachable Peers

  10. Gnutella – Generated Traffic in Bytes (1) • query message length: 83 bytes • simple query relaying (no responses)

  11. Gnutella – Generated Traffic in Bytes (2) • Mean percentage of users who typically share content: 30% • Mean perctg. of users who typically have responses to search queries: 40% • Mean number of search responses the typical respondent offers: 10 • Mean length of search responses the typical respondent offers: 60 •  "Standard client settings yield a whopping 17MB generated in response to […] search query "

  12. Freenet - Concepts • peer-to-peer file storage & retrieval system • every document has a globally unique ID • efficient (?) retrieval algorithm • documents are retrieved with sublinear effort • routing based on likelihood of answer capability • focus on security

  13. Freenet – Query Routing (1) • every peer maintains routing table • table contains known peers along with the IDs of the documents their are storing • a request is routed to the peer most likely to have an answer (closest matching ID) • responses are sent back upstream • intermediate peers also store document and augment their routing tables

  14. Freenet – Query Routing (2) Routing Table B: 14, 20 Doc Cache 19, 30 2. Forward to best match Routing Table C: 19, 30 D: 17, 45, 51 Doc Cache 14, 17, 20 Routing Table C: 19, 30 D: 45, 51 Doc Cache 14,20 C Routing Table B: 14, 20 X: 47, 60 Doc Cache 5, 89 Routing Table B: 14, 17, 20 X: 47, 60 Doc Cache 5, 17, 89 1. Query for doc 17 3. C has no match -> backtrack A B 4. Forward query to 2nd best match 6. Route back response 5. Send back doc 17 Routing Table B: 14, 20 Z: 105, 110 Doc Cache 17, 45, 51, 102, 205 D

  15. Freenet – Document Insert • analogous to query routing • insert is routed to the peer most likely to be interested in new doc (closest matching ID) • intermediate peers cache document and augment routing tables • until TTL is reached

  16. Freenet - Discussion • efficient routing algorithm (compared to Gnutella) • adequate security features/heuristics (the more popular a document, the more frequently it gets cached) • no metasearch • no updates, deletes possible • worst case query routing = DFS

  17. FUtella – Concepts • peer-to-peer platform for general knowledge sharing • tries to model learning style of humans • content-based routing • combines and extends approaches from: • Gnutella (message format) • JXTA (peer groups) • JXTA Search (queryspaces and registrations) • FreeNet (routing of registration discoveries)

  18. FUtella - Knowledge Groups FUtella Net Knowledge Group: Queryspace "Computer Architecture" Group Head: Peer E Inserts Registration E . . . M1 Mi Members M1 - Mi

  19. FUtella - Knowledge Group Discovery 1 Routing Table "computer analysis" -> C "computer systems" -> D "data base" -> A Registration Cache "computer analysis" : Y "computer systems": Z "data base" : X Routing Table "computer" -> B "computer analysis" -> Y Registration Cache "computer": B "computer analysis": Y C 1. Discovery request "computer architecture" 2. Forward discovery request 3. C has no cached registration for "computer architecture -> backtrack A B Routing Table "computer" -> B "computer systems" -> Z "computer architecture" -> E Registration Cache "computer systems": Z "computer": B "computer architecture": E D Routing Table "computer" -> B "data base" -> X Registration Cache "computer": B "data base": X 4. Forward discovery request to 2nd best match

  20. FUtella - Knowledge Group Discovery 2 6. Forward discovery response 5. Discovery response A B D Containing registration "computer architecture": E Routing Table "computer analysis" -> C "computer architecture" -> D "computer systems" -> D "data base" -> A Registration Cache "computer analysis" : Y "computer architecture": E "computer systems": Z "data base" : X Routing Table "computer" -> B "computer systems" -> Z "computer architecture" -> E Registration Cache "computer systems": Z "computer": B "computer architecture": E Routing Table "computer" -> B "computer architecture" -> D "data base" -> X Registration Cache "computer": B "computer architecture": E "data base": X

  21. Futella - Query Processing 1. Discovery request "computer architecture" 2. Forward discovery request C 3. C has no cached registration for "computer architecture -> backtrack A B 4. Forward discovery request to 2nd best match 6. Forward discovery response D 7. Send query 5. Discovery response containing cached registration 8. Forward query to member E M1 9. Query response 8.Forward query to member Knowledge group "computer architecture" . . . 9. Query response Mi

  22. Total Number of Messages 250000 200000 threshold 2 150000 no threshold # msg Gnutella 100000 50000 0 semi-dynamic static peers dynamic peers peers Futella - Test Results (1)

  23. FUtella - Test Results (2)

  24. Conclusion • first and second generation P2P systems still most widely used • practically proven • very flexible in terms of topology • bad scalibility (Gnutella) • no guaranteed lower bound on query effort (Freenet) • (scientificly) far better approach: DHTs (see next presentation)

  25. Questions ? ?

More Related