880 likes | 898 Views
This presentation discusses the problems faced by P2P storage systems and introduces MUREX, a control scheme for structured peer-to-peer storage systems. The presentation includes an analysis and simulation of MUREX and concludes with its benefits.
E N D
MUREX: A Mutable Replica Control Scheme for StructuredPeer-to-Peer Storage Systems Presented by Jehn-Ruey Jiang National Central University Taiwan, R. O. C.
Outline • P2P Systems • P2P Storage Systems • The Problems • MUREX • Analysis and Simulation • Conclusion
Outline • P2P Systems • P2P Storage Systems • The Problems • MUREX • Analysis and Simulation • Conclusion
GET /index.html HTTP/1.0 HTTP/1.1 200 OK ... Client/Server Architecture Client Server
Disadvantages of C/S Architecture • Single point of failure • Strong expensive server • Dedicated maintenance (a sysadmin) • Not scalable - more users, more servers
The Client Side • Today’s clients can perform more roles than just forwarding users requests • Today’s clients have: • More computing power • Storage space • Thin client Fat client
Evolution at the Client Side IBM PC @ 4.77MHz 360k diskettes A PC @ 4GHz 100GB HD DEC’S VT100 No storage 2005 ‘70 ‘80
What Else Has Changed? • The number of home PCs is increasing rapidly • Most of the PCs are “fat clients” • As the Internet usage grow, more and more PCs are connecting to the global net • Most of the time PCs are idle • How can we use all this?
Resources Sharing • What can we share? • Computer resources • Shareable computer resources: • CPU cycles- seti@home, GIMPS • Data - Napster, Gnutella • Bandwidth sharing - Promise • Storage Space- OceanStore, CFS, PAST
SETI@Home • SETI – Search for ExtraTerrestrial Intelligence • @Home – On your own computer • A radio telescope in Puerto Rico scans the sky for radio signals • Fills a DAT tape of 35GB in 15 hours • That data has to be analyzed
SETI@Home (cont.) • The problem – analyzing the data requires a huge amount of computation • Even a supercomputer cannot finish the task on its own • Accessing a supercomputer is expensive • What can be done?
SETI@Home (cont.) • Can we use distributed computing? • YEAH • Fortunately, the problem can be solved in parallel - examples: • Analyzing different parts of the sky • Analyzing different frequencies • Analyzing different time slices
SETI@Home (cont.) • The data can be divided into small segments • A PC is capable of analyzing a segment in a reasonable amount of time • An enthusiastic UFO searcher will lend his spare CPU cycles for the computation • When? Screensavers
SETI@Home - Summary • SETI reverses the C/S model • Clients can also provide services • Servers can be weaker, used mainly for storage • Distributed peers serving the center • Not yet P2P but we’re close • Outcome - great results: • Thousands of unused CPU hours tamed for the mission • 3+ millions of users
Past: Nov. 2003. GIMPS
History of Napster (1/2) • 5/99: Shawn Fanning (freshman, Northeastern University) founds Napster Online (supported by Groove) • 12/99: First lawsuit • 3/00: 25% Univ. of Wisconsin traffic on Napster
History of Napster (2/2) • 2000: estimated 23M users • 7/01: simultaneous online users 160K • 6/02: file bankrupt • … • 10/03: Napster 2 (Supported by Roxio) (users should pay $9.99/month) 1984~2000, 23M domain names are counted vs. 16 months, 23M Napster-style names are registered at Napster
“beastieboy” • song1.mp3 • song2.mp3 • song3.mp3 • “kingrook” • song4.mp3 • song5.mp3 • song6.mp3 • “slashdot” • song5.mp3 • song6.mp3 • song7.mp3 Napster Sharing Style: hybrid center+edge Title User Speed song1.mp3 beasiteboy DSL song2.mp3 beasiteboy DSL song3.mp3 beasiteboy DSL song4.mp3 kingrook T1 song5.mp3 kingrook T1 song5.mp3 slashdot 28.8 song6.mp3 kingrook T1 song6.mp3 slashdot 28.8 song7.mp3 slashdot 28.8 1. Users launch Napster and connect to Napster server 2. Napster creates dynamic directory from users’ personal .mp3 libraries 3. beastieboy enters search criteria s o n g 5 4. Napster displays matches to beastieboy 5. beastieboy makes direct connection to kingrook for file transfer • song5.mp3
About Gnutella • No centralized directory servers • Pings the net to locate Gnutella friends • File requests are broadcasted to friends • Flooding, breadth-first search • When provider located, file transferred via HTTP • History: • 3/14/00: release by AOL, almost immediately withdrawn
Peer-to-Peer Overlay Network Focus at the application layer
Peer-to-Peer Overlay Network End systems one hop (end-to-end comm.) a TCP thru the Internet Internet
Gnutella: Issue a Request xyz.mp3 ?
Gnutella: Reply with the File Fully distributed storage and directory! xyz.mp3
So Far n: number of participating nodes • Centralized : - Directory size – O(n) - Number of hops – O(1) • Flooded queries: - Directory size – O(1) - Number of hops – O(n)
We Want • Efficiency : O(log(n)) messages per lookup • Scalability : O(log(n)) state per node • Robustness : surviving massive failures
How Can It Be Done? • How do you search in O(log(n)) time? • Binary search • You need an ordered array • How can you order nodes in a network and data objects? • Hash function!
Object ID (key):AABBCC Object ID (key):DE11AC SHA-1 SHA-1 Example of Hasing Shark 194.90.1.5:8080
Basic Idea P2P Network Publish (H(y)) Join (H(x)) Object “y” Peer “x” H(y) H(x) Peer nodes also have hash keys in the same hash space Objects have hash keys y x Hash key Place object to the peer with closest hash keys
Mapping Keys to Nodes 0 M - a node - an data object
Internet Viewed as a Distributed Hash Table 0 2128-1 Hash table Peer node
DHT • Distributed Hash Table • Input: key (file name)Output: value (file location) • Each node is responsible for a range of the hash table, according to the node’s hash key. Objects are placed in (managed by) the node with the closest key • It must be adaptive to dynamic node joining and leaving
How to Find an Object? 0 2128-1 Hash table Peer node
Simple Idea • Track peers which allow us to move quickly across the hash space • a peer p tracks those peers responsible for hash keys(p+2i-1), i=1,..,m i i+22 i+24 i+28 0 2128-1 Hash table Peer node
DHT example: Chord -- Ring Structure N8 knows of only six other nodes. Circular 6-bit ID space O(log n) states per node
Chord Lookup – with finger table O(log n)hops (messages)for each lookup!! 1 I’m node 2. Please find key 14! 15 14 2 14 ∈[10,2) 3 12 Circular 4-bit ID space 10 14 ∈[14,2) 7
Classification of P2P systems • Hybrid P2P – Preserves some of the traditional C/S architecture. A central server links between clients, stores indices tables, etc • Napster • Unstructured P2P– no control over topology and file placement • Gnutella, Morpheus, Kazaa, etc • Structured P2P – topology is tightly controlled and placement of files are not random • Chord, CAN, Pastry, Tornado, etc
Outline • P2P Systems • P2P Storage Systems • The Problems • MUREX • Analysis and Simulation • Conclusion
P2P Storage Systems • To aggregate idle storage across the Internet to be a huge storage space • Towards Global Storage Systems • Massive Nodes • Massive Capacity
Replication • Data objects are replicated for the purpose of fault-tolerance • Some DHTs have provided replication utilities, which are usually used to replicate routing states • The proposed protocol replicates data objects in the application layer so that it can be built on top of any DHT high data availability
Two Types of P2P Storage Systems • Non-Mutable (Read-only): • CFS • PAST • Charles • Mutable: • Ivy • Eliot • Oasis • Om Our Focus!!
One-Copy Equivalence • Data consistency Criterion • The set of replicas must behave as if there were only a single copy • Conditions: • no pair of write operations can proceed at the same time, • no pair of a read operation and a write operation can proceed at the same time, • a read operation always returns the replica that the last write operation writes.
Two Types of Methods to Achieve One-Copy Equivalence Our Focus • Synchronous Replication • Each write operation should finish updating all replicas before the next write operation proceeds. • Strict data consistency strictly • Long operation latency • Asynchronous Replication • A write operation is written to the local replica; data object is then asynchronously written to other replicas. • May violate data consistency • Shorter latency • Log-based mechanisms to roll back the system