870 likes | 1.02k Views
Peer-to-Peer (P2P) and Sensor Networks. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu http://www.ecse.rpi.edu/Homepages/shivkuma Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King.
E N D
Peer-to-Peer (P2P) and Sensor Networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu http://www.ecse.rpi.edu/Homepages/shivkuma Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King
P2P networks: Napster, Gnutella, Kazaa • Distributed Hash Tables (DHTs) • Database perspectives: data-centricity, data-independence • Sensor networks and its connection to P2P Overview
P2P: Key Idea • Share the content, storage and bandwidth of individual (home) users Internet
What is P2P (Peer-to-Peer)? • P2P as a mindset • Slashdot • P2P as a model • Gnutella • P2P as an implementation choice • Application-layer multicast • P2P as an inherent property • Ad-hoc networks
P2P Application Taxonomy P2P Systems Distributed Computing SETI@home File Sharing Gnutella Collaboration Jabber Platforms JXTA
A Straightforward Idea Use a BIG server Store the object How to do it in a distributed way? Network Provide a directory
Why Distributed? • Client-server model: • Client is dumb • Server does most things (compute, store, control) • Centralization makes things simple, but introduces • Single point of failure, performance bottleneck, tighter control, access fee and manage cost, … • ad hoc participation? • Estimate of net PCs • 10 billions of Mhz CPUs • 10000 terabytes of storage • Clients are not that dumb after all • Use the resources in the clients (at net edges)
First Idea: Napster • Distributing objects, centralizing directory: Network
Today: P2P Video traffic is dominant • Source: cachelogic; Video, bittorrent, edonkey !
40-60%+ P2P traffic
2006 p2p Data • Between 50 and 65 percent of all download traffic is P2P related.Between 75 and 90 percent of all upload traffic is P2P related. • And it seems that more people are using p2p today • In 2004 1 CacheLogic-server registered 3 million IP-addresses in 30 daysIn 2006 1 CacheLogic-server registered 3 million IP-addresses in 8 days • So what do people download? • 61,4 percent video11,3 percent audio27,2 percent is games/software/etc. • The average filesize of shared files is 1 gigabyte! • Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/
A More Aggressive Idea • Distributing objects and directory: Blind flooding! How to find objects w/o directory? Network
Gnutella • Distribute file location • Idea: flood the request • Hot to find a file: • Send request to all neighbors • Neighbors recursively multicast the request • Eventually a machine that has the file receives the request, and it sends back the answer • Advantages: • Totally decentralized, highly robust • Disadvantages: • Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)
xyz Gnutella: Unstructured P2P • Ad-hoc topology • Queries are flooded for bounded number of hops • No guarantees on recall xyz Query: “xyz”
Lessons and Limitations • Client-Server performs well • But not always feasible • Ideal performance is often not the key issue! • Things that flood-based systems do well • Organic scaling • Decentralization of visibility and liability • Finding popular stuff • Fancy local queries • Things that flood-based systems do poorly • Finding unpopular stuff [Loo, et al VLDB 04] • Fancy distributed queries • Vulnerabilities: data poisoning, tracking, etc. • Guarantees about anything (answer quality, privacy, etc.)
2 4 3 1 tracker website seed/leecher new leecher BitTorrent – joining a torrent Peers divided into: • seeds:have the entire file • leechers: still downloading metadata file join peer list datarequest 1. obtain the metadata file 2. contact the tracker 3. obtain a peerlist (contains seeds & leechers) 4. contact peers from that list for data
leecher A seed leecher B leecher C BitTorrent – exchanging data I have ! ●Verify pieces using hashes ●Download sub-pieces in parallel ● Advertise received pieces to the entire peer list ● Look for the rarest pieces
leecher A seed leecher B leecher C leecher D BitTorrent - unchoking ● Periodically calculate data-receiving rates ● Upload to (unchoke) the fastest downloaders ● Optimistic unchoking ▪ periodically select a peer at random and upload to it▪ continuously look for the fastest partners
Back to … P2P Structures • Unstructured P2P architecture • Napster, Gnutella, Freenet • No “logically” deterministic structures to organize the participating peers • No guarantee objects be found • How to find objects within some no. of hops? • Extend hashing • Structured P2P architecture • CAN, Chord, Pastry, Tapestry, Tornado, … • Viewed as a distributed hash table for directory
How to Bound Search Quality? • Many ideas …, again Work on placement! Network
to h z High-Level Idea: Indirection • Indirection in space • Logical (content-based) IDs, routing to those IDs • “Content-addressable” network • Tolerant of churn • nodes joining and leaving the network • Indirection in time • Want some scheme to temporally decouple send and receive • Persistence required. Typical Internet solution: soft state • Combo of persistence via storage and via retry • “Publisher” requests TTL on storage • Republishes as needed • Metaphor: Distributed Hash Table h=z
Basic Idea P2P Network Publish (H(y)) Join (H(x)) Object “y” Peer “x” H(y) H(x) Peer nodes also have hash keys in the same hash space Objects have hash keys y x Hash key Place object to the peer with closest hash keys
Distributed Hash Tables (DHTs) • Abstraction: a distributed hash-table data structure • insert(id, item); • item = query(id); (or lookup(id);) • Note: item can be anything: a data object, document, file, pointer to a file… • Proposals • CAN, Chord, Kademlia, Pastry, Tapestry, etc • Goals: • Make sure that an item (file) identified is always found • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes
Internet Viewed as a Distributed Hash Table 0 2128-1 Hash table Peer node Each is responsible for a range of the hash table, according to the peer hash key Objects are placed in the peer with the closest key Note that peers are Internet edges
How to Find an Object? 0 2128-1 Hash table Peer node Want to keep only a few entries! one hop to find the object Simplest idea: Everyone knows everyone else!
K I K I K I K I K I K I K I K I K I (K1,I1) I1 put(K1,I1) get (K1) Structured Networks • Distributed Hash Tables (DHTs) • Hash table interface: put(key,item), get(key) • O(log n) hops • Guarantees on recall
0 2128-1 Hash table Peer node Content Addressable Network, CAN • Distributed hash table • Hash table as in a Cartesian coordinate space • A peer only needs to know its logical neighbors • Dimensional-ordered multihop routing
Content Addressable Network (CAN) • Associate to each node and item a unique id in an d-dimensional Cartesian space on a d-torus • Properties • Routing table size O(d) • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes
CAN Example: Two Dimensional Space • Space divided between nodes • All nodes cover the entire space • Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 • Example: • Node n1:(1, 2) first node that joins cover the entire space 7 6 5 4 3 n1 2 1 0 0 2 3 4 6 7 5 1
CAN Example: Two Dimensional Space • Node n2:(4, 2) joins space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1
CAN Example: Two Dimensional Space • Node n2:(4, 2) joins space is divided between n1 and n2 7 6 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1
CAN Example: Two Dimensional Space • Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1
CAN Example: Two Dimensional Space • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1
CAN Example: Two Dimensional Space • Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1
CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Example: assume n1 queries f4 • Can route around some failures 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1