450 likes | 677 Views
Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell’Informazione. Outline. Introduction to P2P systems Common topologies Data location Churn Newscast algorithm Security.
E N D
Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell’Informazione
Outline • Introduction to P2P systems • Common topologies • Data location • Churn • Newscast algorithm • Security
Peer-to-peer vs. client-server client-server peer-to-peer • Only nodes located on the “periphery” of the Internet • Tasks distributed across all nodes • Clients talk to other clients • Server well connected to the “center” of the Internet • Servers carries out critical tasks • Clients only talk to server
Example – Video sharing Client-server: YouTube • Advantages • Client can disconnect after upload • Uploader needs little bandwidth • Other users can find the file easily(just use search on server webpage) • Disadvantages • Server may not accept file or remove it later (according to content policy) • Whole system depends on the server(what if shut down like Napster?) • Server storage and bandwidthare expensive! downloader downloader downloader uploader downloader downloader client-server
Example – Video sharing Peer-to-peer: BitTorrent • Advantages • Does not depend on a central server • Bandwidth shared across nodes(downloaders also act as uploaders) • High scalability, low cost • Disadvantages • Seeder must remain on-line to guarantee file availability • Content is more difficult to find(downloaders must find .torrent file) • Freeloaders cheat in order to download without uploading downloader downloader downloader seeder downloader downloader peer-to-peer
Comparison: P2P vs. client-server • Client-server • Asymmetric: client and servers carry out different tasks • Global knowledge: servers have a global view of the network • Centralization: communications and management are centralized • Single point of failure: a server failure brings down the system • Limited scalability: servers easily overloaded • Expensive: server storage and bandwidth capacity is not cheap • Peer-to-peer • Symmetric: each node carries out the same tasks • Local knowledge: nodes only know a small set of other nodes • Decentralization: nodes must self-organize in a decentralized way • Robustness: several nodes may fail with little or no impact • High scalability: high aggregate capacity, load distribution • Low-cost: storage and bandwidth are contributed by users
Characterizing peer-to-peer systems • The main characteristics of P2P systems are: • decentralization (i.e., no central server) • self-organization (e.g., adding new nodes and removing disconnected ones) • symmetric communications (e.g., peers act as clients and servers) • scalability (thanks to high aggregate capacity and load distribution) • shared ownership (i.e., storage and bandwidth are contributed by peers) • overlay construction and routing (i.e., nodes form a logical network on top of the underlying IP network) a message from one peer to another is sent through the underlying IP network
P2P environment • P2P systems are deployed in a challenging environment: • High latency and low bandwidth between nodes- a high hop count will result in a high end-to-end latency - transferring large files may take a long time • Churn- nodes may disconnect temporarily- new nodes are constantly joining the system, while others leave the overlay permanently • Security- P2P clients run on machines under full control of their users- data sent to other nodes may be erased, corrupted, disclosed, etc.- malicious users may try to bring down the system (e.g., routing attack) • Selfishness- users may run hacked P2P clients in order to avoid contributing resources
Problems • Some of the problems that a P2P systems designer must face: • Overlay construction and maintenance- maintain a given overlay topology (e.g., random, two-level, ring, etc.) • Data location- locate a given data object among a large number of nodes • Data dissemination- propagate data in an efficient and robust manner • Per-node state- keep the amount of state per node small • Tolerance to churn- maintain system invariants (e.g., topology, data location, data availability) despite node arrivals and departures
Topology • Some common topologies: • Flat unstructured: a node can connect to any other node- only constraint: maximum degree dmax- fast join procedure- usually very tolerant to churn- good for data dissemination, bad for location • Two-level unstructured: nodes connect to a supernode- supernodes form a small overlay- used for indexing and forwarding- large state and high load on supernodes • Flat structured: constraints based on node ids - allows for efficient data location- constraints require long join and leave procedures- less robust in high-churn environments
Data location - Flooding • Problem: find the set of nodes S that store a copy of object O • Solutions: • (1) Flooding: send a search message to all nodes [first Gnutella protocol] • A search message contains either keywords or an object id • Advantages: • - simplicity • - no topology constraints • Disadvantages: • - high network overhead (huge traffic generated by each search request) • - flooding stopped by TTL (which produces search horizon) • - only applicable to small number of nodes
Data location - Flooding (1) Flooding (cont.) Flooding in a flat unstructured network: obj search horizon for TTL = 2 search • Objects that lie outside of the horizon are not found
Data location - Superpeers • (2) Two-level overlay: use superpeers to index the locations of an object [eMule, Gnutella 2, BitTorrent] • Each node connects to a superpeer and advertises the list of objects it stores • Search requests are sent to the superpeer, which forwards them to other superpeers • Advantages: • - highly scalable • Disadvantages: • - superpeers must be realiable, powerful and well connected to the Internet (expensive) • - superpeers must maintain large state • - the system relies on a small number of superpeers
Data location - Superpeers (2) Two-level overlay (cont.) request obj response • A two-level overlay is a partially centralized system • In some systems superpeers do not connect to each other (e.g., BitTorrent)
Data location - KBR • (3) Structured networks: use a routing algorithm that implements Key-Based Routing [Overnet, Kad, BitTorrent trackerless] • Key-Based Routing (also known as Distributed Hash Tables, or DHTs) works as follows: • each node is given a unique node identifier, or nodeid • given a key k, the node whose nodeidis numerically closest to kamong all nodes in the network is known as the root of key k • given a routing key k, a KBR algorithm can route a message to theroot of k in a small number of hops, usually O(log N) • the location of an object with id objectid is tracked by the root ofk = objectid • thus, one can find the location of an object by routing a message to theroot of k = objectid and querying the root for the location of the object
Data location - KBR (cont.) • Key-Based Routing [Pastry] • Source node id: 04F2 • k = object id: 8955 • Hop # Hop id Shared prefix length • 0 04F2 0 • 1 85E0 1 • 2 8909 2 • 3 8957 3 • 4 8954 3 (root of k) route(k=8955,msg) 04F2 E25A obj 8955 C52A 3A79 AC78 5230 8957 620F 8954 obj 8955 obj8955stored onnodes 620F,C52A 8909 85E0 • Object 8955 is tracked by node 8954, which knows of two copies storedat nodes 620F and C52A 8821 overlay address space [0000,FFFF]
Data location - KBR (cont.) • Routing table for node 4F28 [Pastry] used to find next hop with longer shared prefix used to find the nodeid closest to a key that is close to the local nodeid • In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = 65536nodes. • The average route length in this case is 4 hops.
Data location - KBR (cont.) • (3) Structured networks (cont.) • Advantages: • - completely decentralized (no need for superpeers) • - routing algorithm achieves low hop count for large network sizes • Disadvantages: • - each object must be tracked by a different node • - objects are tracked by unreliable nodes (i.e., which may disconnect) • - keyword-based searches are more difficult to implement than with superpeers (because objects are located by their objectid) • - the overlay must be structured according to a given topology in order to achieve a low hop count • - routing tables must be updated every time a node joins or leaves the overlay
Data location - Loosely structured overlays • (4) Loosely structured networks: use hints on the location of objects [Freenet] • Nodes locate objects by sending search requests containing the object id • Requests are propagated using a technique similar to flooding • Objects with similar identifiers are grouped on the same nodes requestfor AE5J C A B AE5J 5B20 response F D E AF02
Data location - Loosely structured overlays • (4) Loosely structured networks (cont.) • A search response leaves routing hints on the path back to the source • Hints are used when propagating future requests for similar object ids HintsAE5J: B HintsAE5J: D5B20: E C A requestfor AF02 B AE5J 5B20 F D AF02 E HintsAE5J: F
Data location - Loosely structured overlays • (4) Loosely structured networks (cont.) • Advantages: • - no topology constraints, flat architecture • - searches are more efficient than with plain flooding • Disadvantages: • - does not support keyword-based searches • - search requests have a TTL • - contrary to structured overlays, loosely structured overlay do not guarantee a low number of hops, nor that the object will be found
Data location - Summary • The location schemes described previously can be classified according to: • degree of structure • decentralization
Effects of churn • Churn (node arrivals and departures) can have several effects on a P2P system: • data objects may be become unavailable if all replicas disconnect • routing tables may become inconsistent(e.g., entries may point to nodes which have disconnected) • the overlay may become partitioned if several nodes suddenly disconnect:
Churn tolerance • Node arrivals and departures must not disrupt the normal behavior of the system • system invariants must be maintained • - connected overlay (i.e., no partitions), low average path length • - data objects accessible from anywhere in the network • two types of churn tolerance: • - dynamic repair: ability to react to changes in the overlay to maintain system invariants (e.g., heal partitions) • - static resilience: ability to continue operating correctly before adaptation occurs (e.g., route messages through alternate paths)
Churn – Preventing partitions An simple way to prevent partitions is to increase the node degree Ring partitions can be avoided by keeping a list of successor nodes
Churn – Static resilience (Ring topology) Percentage of failed paths for various numbers of successor nodes 1 successor 16 successors 48 successors More successors reduce the chances of “opening” the ring
Churn – Static resilience (various topologies) Percentage of failed paths for various overlay topologies Some topologies provide more alternate paths between nodes than others
Churn – Dynamic repair (Ring topology) • Dynamically update routing information to adapt to overlay changes • Two types of repair algorithms: • - reactive: start maintenance procedure immediately after detection • - periodic: execute maintenance procedure periodically • A reactive algorithm can bring down the system instead of repairing it: • Each node disconnection triggers a maintenance procedure on a set of nodes • Over a given churn rate, the maintenance traffic congests the network • The network congestion leads to nodes being considered as disconnected • This triggers even more maintenance procedures (positive feedback), eventually bringing down the network • This effect may be avoided using a periodic maintenance algorithm
Churn – Summary • Churn is an important issue in P2P overlays: • Data may become unavailable, and routing information outdated • Static resilience - depends on the topology (i.e., the number of alternate paths) - increases with the average node degree • Dynamic repair protocols must be carefully designed - reactive protocols are usually faster - periodic protocols can handle higher churn rates
Newscast • Peer-to-peer protocol that creates and maintains an unstructured overlay • Highly resilient to churn • Can be used to propagate information • Extremely simple design based on information gossip: • - Each node only knows about a small set of other knows (view) • - Each node periodically picks a random node from this set • - Both nodes exchange their views and update them • The random view exchange makes the algorithm very robust to failures and changes in the overlay
Newscast - View exchange • Each node maintains a view v containing c entries, where • entry = {node address, timestamp} • Each node executes the following code every T seconds: • 1. select random entry r from local view • 2. send local view plus an entry with the local address to node r • 3. retrieve view of node r and merge it with local view • 4. keep the c entries with the most recent timestamps • The view of a node changes on each round
Newscast - View exchange (cont.) Example view exchange initiated by node A B,10 D,5 E,3 X,8 S,12 W,2 B D W E A view of node A (c = 6) S X
Newscast - View exchange (cont.) 1. Select random node B,10 D,5 E,3 X,8 S,12 W,2 selected node view of node A 2. Exchange views (plus local entry) with selected node E,15 J,14 C,9 D,8 H,10 L,14 Z,2 A,15 B,10 D,5 E,3 X,8 S,12 W,2 view of node E view of node A E A
Newscast - View exchange (cont.) 3. Merge views, and order by timestamp 4. Keep c most recent entries E,15J,14L,14 S,12 H,10 B,10 E,15J,14L,14 S,12 H,10 B,10C,9 X,8 D,8 D,5 E,3 W,2 Z,2 merge result new view of node A (c = 6)
Newscast - Average path length • Newscast overlays have a low average path length, i.e., O(log N)(average number of hops between any two nodes)
Newscast - Static resilience • The overlay shows high static resilience
Newscast - Summary • Simple peer-to-peer algorithm • Nodes use only local information • Periodic peer-wise data exchanges • Emergent properties: • - low average path length • - resistance to high churn
Security • Security in peer-to-peer systems is hard to enforce: • Users have full control on their computers • Modified clients may not follow the standard protocol • Communications may be eavesdropped • Data may be corrupted • Private data stored on remote computers may be disclosed
Security - Weak identities • The user may leave the system and rejoin it with a new identity (i.e, user id) • If an attack is detected, the attacker can reenter the system with a new id • An attacker may create a large number of false identities (Sybil attack) • Example of Sybil Attack: • Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine • The attacker can intercept all traffic coming from or going to node A S1 S2 S6 S3 A S5 S4
Security - Strong identities • The user cannot change its identity • Solution: use a centralized, trusted Certification Authority (CA) • - Each new user must obtain an identity certificate: • certificate = { user id, IP address, user’s public key, signatureCA } • - The certificate is digitally signed by the CA, whose public key is known by all users • - A certificate cannot be forged (would require the CA’s private key) • To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA • Strong identities prevent Sybil Attacks • If an attacker is caught, it cannot easily rejoin the system
Security – Weak vs. strong identities • Strong identities requires a centralized CA • new nodes must contact the CA before joining the network: • - the CA response may be slow (a few days) • - if the CA is unavailable, new nodes cannot join the system • the security of the system depends on the CA: • - the CA must correctly verify the identity of the requester • - the CA’s private key must be kept secret • Many P2P systems use weak identities • IP address already gives some identity information • Some systems ensure anonymity (e.g., FreeNet)
The end • You can get these slides from • http://www.cs.unibo.it/~picconi/slides • For any questions, mail me to: • picconi@cs.unibo.it