1.14k likes | 1.39k Views
Peer to Peer Technologies. Roy Werber Idan Gelbourt. prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001. Lecture Overview. 1 st Part: The P2P communication model, architecture and applications 2nd Part: Chord and CFS. Peer to Peer - Overview.
E N D
Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001
Lecture Overview • 1st Part: • The P2P communication model, architecture and applications • 2nd Part: • Chord and CFS
Peer to Peer - Overview • A class of applications that takes advantage of resources: • Storage, CPU cycles, content, human presence • Available at the edges of the Internet • A decentralized system that must cope with the unstable nature of computers located at the network edge
Client/Server Architecture • An architecture in which each process is a client or a server • Servers are powerful computers dedicated for providing services – storage, traffic, etc • Clients rely on servers for resources
Client/Server Properties • Big, strong server • Well known port/address of the server • Many to one relationship • Different software runs on the client/server • Client can be dumb (lacks functionality), server performs for the client • Client usually initiates connection
Server Client Client Internet Client Client Client Server Architecture
GET /index.html HTTP/1.0 HTTP/1.1 200 OK ... Client/Server Architecture Client Server
Disadvantages of C/S Architecture • Single point of failure • Strong expensive server • Dedicated maintenance (a sysadmin) • Not scalable - more users, more servers
Solutions • Replication of data (several servers) • Problems: • redundancy, synchronization, expensive • Brute force (a bigger, faster server) • Problems: • Not scalable, expensive, single point of failure
The Client Side • Although the model hasn’t changed over the years, the entities in it have • Today’s clients can perform more roles than just forwarding users requests • Today’s clients have: • More computing power • Storage space
Thin Client • Performs simple tasks: • I/O • Properties: • Cheap • Limited processing power • Limited storage
Fat Client • Can perform complex tasks: • Graphics • Data manipulation • Etc… • Properties: • Strong computation power • Bigger storage • More expensive than thin
Evolution at the Client Side IBM PC @ 4.77MHz 360k diskettes A PC @ 2GHz 40GB HD DEC’S VT100 No storage 2001 ‘70 ‘80
What Else Has Changed? • The number of home PCs is increasing rapidly • PCs with dynamic IPs • Most of the PCs are “fat clients” • Software cannot cope with hardware development • As the Internet usage grow, more and more PCs are connecting to the global net • Most of the time PCs are idle • How can we use all this?
Sharing • Definition: • To divide and distribute in shares • To partake of, use, experience, occupy, or enjoy with others • To grant or give a share in intransitive senses Merriam Webster’s online dictionary (www.m-w.com) • There is a direct advantage of a co-operative network versus a single computer
Resources Sharing • What can we share? • Computer resources • Shareable computer resources: • “CPU cycles” - seti@home • Storage - CFS • Information - Napster / Gnutella • Bandwidth sharing - Crowds
SETI@Home • SETI – Search for ExtraTerrestrial Intelligence • @Home – On your own computer • A radio telescope in Puerto Rico scans the sky for radio signals • Fills a DAT tape of 35GB in 15 hours • That data has to be analyzed
SETI@Home (cont.) • The problem – analyzing the data requires a huge amount of computation • Even a supercomputer cannot finish the task on its own • Accessing a supercomputer is expensive • What can be done?
SETI@Home (cont.) • Can we use distributed computing? • YEAH • Fortunately, the problem be solved in parallel - examples: • Analyzing different parts of the sky • Analyzing different frequencies • Analyzing different time slices
SETI@Home (cont.) • The data can be divided into small segments • A PC is capable of analyzing a segment in a reasonable amount of time • An enthusiastic UFO searcher will lend his spare CPU cycles for the computation • When? Screensavers
SETI@Home - Summary • SETI reverses the C/S model • Clients can also provide services • Servers can be weaker, used mainly for storage • Distributed peers serving the center • Not yet P2P but we’re close • Outcome - great results: • Thousands of unused CPU hours tamed for the mission • 3+ millions of users
What Exactly is P2P? • A distributed communication model with the properties: • All nodes have identical responsibilities • All communication is symmetric
Client Client Client Internet Client Client P2P Properties • Cooperative, direct sharing of resources • No central servers • Symmetric clients
P2P Advantages • Harnesses client resources • Scales with new clients • Provides robustness under failures • Redundancy and fault-tolerance • Immune to DoS • Load balance
P2P Disadvantages -- A Tough Design Problem • How do you handle a dynamic network (nodes join and leave frequently) • A number of constrains and uncontrolled variables: • No central servers • Clients are unreliable • Client vary widely in the resources they provide • Heterogeneous network (different platforms)
Two Main Architectures • Hybrid Peer-to-Peer • Preserves some of the traditional C/S architecture. A central server links between clients, stores indices tables, etc • Pure Peer-to-Peer • All nodes are equal and no functionality is centralized
Hybrid P2P • A main server is responsible for various administrative operations: • Users’ login and logout • Storing metadata • Directing queries • Example: Napster
Examples - Napster • Napster is a program for sharing information (mp3 music files) over the Internet • Created by Shawn Fanning in 1999 although similar services were already present (but lacked popularity and functionality)
“beastieboy” • song1.mp3 • song2.mp3 • song3.mp3 “kingrook” • song4.mp3 • song5.mp3 • song6.mp3 “slashdot” • song5.mp3 • song6.mp3 • song7.mp3 Napster Sharing Style: hybrid center+edge Title User Speed song1.mp3 beasiteboy DSL song2.mp3 beasiteboy DSL song3.mp3 beasiteboy DSL song4.mp3 kingrook T1 song5.mp3 kingrook T1 song5.mp3 slashdot 28.8 song6.mp3 kingrook T1 song6.mp3 slashdot 28.8 song7.mp3 slashdot 28.8 1. Users launch Napster and connect to Napster server 2. Napster creates dynamic directory from users’ personal .mp3 libraries 3. beastieboy enters search criteria s o n g 5 4. Napster displays matches to beastieboy 5. beastieboy makes direct connection to kingrook for file transfer • song5.mp3
What About Communication Between Servers? • Each Napster server creates its own mp3 exchange community: • rock.napster.com, dance.napster.com, etc… • Creates a separation which is bad • We would like multiple servers to share a common ground. Reduces the centralization nature of each server, expands searchability
Various HP2P Models –1. Chained Architecture • Chained architecture – a linear chain of servers • Clients login to a random server • Queries are submitted to the server • If the server satisfies the query – Done • Otherwise – Forward the query to the next server • Results are forwarded back to the first server • The server merges the results • The server returns the results to the client • Used by OpenNap network
2. Full Replication Architecture • Replication of constantly updated metadata • A client logs on to a random server • The server sends the updated metadata to all servers • Result: • All servers can answer queries immediately
3. Hash Architecture • Each server holds a portion of the metadata • Each server holds the complete inverted list for a subset of all words • Client directs a query to a server that is responsible for at least one of the keywords • That server gets the inverted lists for all the keywords from the other servers • The server returns the relevant results to the client
4. Unchained Architecture • Independent servers which do not communicate with each other • A client who logs on to one server can only see the files of other users at the same local server • A clear disadvantage of separating users into distinct domains • Used by Napster
Pure P2P • All nodes are equal • No centralized server • Example: Gnutella
A completely distributed P2P network • Gnutella network is composed of clients • Client software is made of two parts: • A mini search engine – the client • A file serving system – the “server” • Relies on broadcast search
Gnutella - Operations • Connect – establishing a logical connection • PingPong – discovering new nodes (my friend’s friends) • Query – look for something • Download – download files (simple HTTP)
Gnutella – Form an Overlay Ping Pong Pong Pong Ping OK Pong Connect Ping Pong Ping Pong
How to find a node? • Initially, ad hoc ways • Email, online chat, news groups… • Bottom line: you got to know someone! • Set up some long-live nodes • New comer contacts the well-known nodes • Useful for building better overlay topology
A B Gnutella – Search • Toad A – look nice • Toad B – too far Green Toad I have I have Green Toad Green Toad I have I have Green Toad
Gnutella – Scalability Issue • Can the system withstand flooding from every node? • Use TTL to limit the range of propagation • 5 ^ 5 = 3125, how much can you get ? • Creates an “horizon” of computers • The promise is an expectation that you can change horizon everyday when login
The Differences • While the pure P2P model is completely symmetric, in the hybrid model elements of both PP2P and C/S coexist • Each model has its disadvantages • PP2P is still having problems locating information • HP2P is having scalability problems as with ordinary server oriented models
P2P – Summary • The current settings allowed P2P to enter the world of PCs • Controls the niche of sharing resources • The model is being studied from the academic and commercial point of view • There are still problems out there…
Part II Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley Roy Werber Idan Gelbourt
A P2P Problem • Every application in a P2P environment must handle an important problem: The lookup problem • What is the problem?
A Peer-to-peer Storage Problem • 1000 scattered music enthusiasts • Willing to store and serve replicas • How do you find the data?
The Lookup Problem N2 N1 N3 Key=“title” Value=MP3 data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5 Dynamic network with N nodes, how can the data be found?