1 / 22

CSC407: Software Architecture Winter 2007 Peer to Peer

CSC407: Software Architecture Winter 2007 Peer to Peer. Greg Wilson BA 4234 gvwilson@cs.utoronto.ca. Overview.

nizana
Download Presentation

CSC407: Software Architecture Winter 2007 Peer to Peer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC407: Software ArchitectureWinter 2007Peer to Peer Greg Wilson BA 4234 gvwilson@cs.utoronto.ca

  2. Overview • A peer-to-peer (P2P) system is one that relies on the computing power and bandwidth of all participating machines, rather than on that of a relatively small number of distinguished servers • Each servent (SERVer + cliENT) has the same capabilities, and fills the same role • Fluid, asynchronous membership • No single point of failure • Or censorship

  3. Centralization • First architectural issue is how the overlay network is structured • Purely decentralized: all nodes perform exactly the same tasks • Partially centralized: some otherwise-normal nodes (temporarily) play a special role • “You’re the boss today” • Hybrid decentralized: central server(s) coordinate or bootstrap the P2P overlay network

  4. Network Structure • Is the overlay network completely ad hoc, or are rules followed when adding nodes? • Unstructured: placement of content and capabilities is completely arbitrary • Means that content and capabilities must somehow be located each time they’re needed • Best for highly-transient populations with simple requirements • Structured: content placement follows rules • Rules help participants know where to look for things • More scalable, but only if you know exactly what you’re looking for

  5. SETI@Home • Screensaver looking for signals from outer space • Anyone can participate… • …but reliant on central servers • Is it P2P or client/server? • Does it matter to anyone besides the marketing team?

  6. Napster • A relatively small number of distinguished servers provide an indexing service • Once files are located, further communication between network participants is direct Napster 1 2 3 you me 4

  7. Instant Messaging • Most IM systems work the same way as Napster • Signing in tells the system where you are located • Communication with your friends can then travel point-to-point • Q1: where does account information live? • In particular, is it replicated or not? • Q2: how are multi-party chats implemented? • Centralized, leader/follower, broadcast, or other?

  8. Gnutella • No centralization of any kind • Protocol (on top of TCP/IP) uses four message types: • Ping: ask a host if it’s a member of the network • Pong: confirmation (including IP and port, and inventory of files being shared) • Query: what to look for, and speed requirements • Query Hits: IP, port, and speed of host, number of matching files, and an indexed result set

  9. …Gnutella • Bootstrap via gnutellahosts.com • Ping any node to “get on the network” • Use flood (broadcast) to find files • Ask your neighbors, who ask their neighbors • Prevent overload by including a time to live (TTL) header in each message • Use unique message IDs to prevent cycles • Once a file is found, download point-to-point

  10. Random Walks • Flooding (even with TTL) quickly overloads the network • Use random walks instead • Message wanders around until it finds the desired file • Works best with proactive object replication • Eventually evolve into distributed agent systems • Move a bit of code from place to place instead of trying to squeeze the query into a straitjacket

  11. Kazaa • Some nodes elect to be supernodes • Chosen based on bandwidth and processing power • Nodes may opt out (configuration file) • Sueprnodes index the files shared by peers connected to them, and proxy search requests • Reduces discovery time • Takes advantage of heterogeneity • Without introducing single point of failure

  12. Freenet • Loosely structured: nodes can estimate which other node is most likely to store certain content • Use chain mode propagation to forward messages along the most likely path • Each file identified by three keys: • Simplest is hash of short descriptive text string • Files are placed at nodes possessing files with similar keys (and replicated) • Propagation radius limited

  13. …Freenet • Search messages are propagated most-likely-first • When successful replies come back, intermediate nodes remember them to speed up future searches • Freenet also supports indirect files • Named according to likely keyword searches • “Content” is a reference to the real file • Distributed equivalent of symbolic links (?)

  14. …Freenet • Nodes tend to specialize in searching for similar keys over time • Nodes store similar keys over time (due to caching of files after successful queries) • System stays balanced because similarity of keys does not reflect similarity of files • Routing independent of underlying network topology

  15. RReepplliiccaattiioonn • Passive: occurrs naturally as nodes copy files • Cache-based: keep a copy of everything that passes through you • Active: proactively migrate content to: • Balance load • Reduce search radius • Accommodate failure

  16. Validating Content • How to be sure the file you’re downloading is the file that was uploaded? • Self-certifying data: data is indexed by a hash of its key • Doesn’t support fuzzy or partial lookoup • Separate forwarding from storing, so that file location(s) are hidden • Only defers the problem

  17. …Validating Content • What about malicious routing? • A node joins the network, then pretends to be forwarding messages when in fact it’s responding locally with fabricated data • We don’t have an answer to this even on centralized systems

  18. Garbage Collection • Every file system is eventually 99% full • Owner deletes? • Hard in asynchronous network • Content expiration? • Requires confirmation that the file found is the file the user was searching for • In practice, people don’t fill in forms

  19. Anonymity • The raison d’être of many P2P systems • Politics, payment, and porn • May want to anonymize any or all of: • The author/publisher of content • The identity of a node storing content • The identity and details of the content itself • The details of a query for content retrieval

  20. …Anonymity • Freenet replies retrace the request’s steps to make tracing as difficult as possible • Any node in the chain can claim to be the source, or claim that someone else was • Hops-to-live value is randomized to obscure search radius • OceanStore and PAST store encrypted content without keys • How you get the key is your business

  21. Incentives • The tragedy of the commons: everyone wants to be a client, no-one wants to be a server • Paying people per download will bankrupt you • eBay centralized reputations? • EigenTrust collates upload histories from a dynamic set of servents • Possible (though not easy) to lie • Resource trading becoming popular • But again, how to verify?

  22. Legal Issues • Is this part of software architecture? • Accessibility and safety are part of physical architecture • Never mind the hosting: to what extent are the designers and coders responsible? • If you make a bomb, you’re an accomplice to murder • What if you publish a description of how to smuggle pamphlets past a dictator’s border guards?

More Related