250 likes | 376 Views
Data Currency in Replicated DHTs. Reza Akbarinia , Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu. Motivation. P2P data sharing systems Enable large amount of users to share a massive number of files
E N D
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu
Motivation • P2P data sharing systems • Enable large amount of users to share a massive number of files • Query Reply Send request Download • Message forwarding on these systems • Flooding : KaZaA, Gnutella • DHT : CAN, Chord, Pastry, … etc.
Distributed Hash Table (DHT) • Use hash functions to locate files • h(meta data) = k (for identification) • g(k) = k1 (for routing) k1 U A Meta FreeLoop.mp3 B F g(k)=k1 (A) C E D
Data Replication • What if node A fails? • Duplicate several copies U A k1 Meta FreeLoop.mp3 B F g(h(FreeLoop.mp3))=k1 (A) C E g2(h(FreeLoop.mp3))=k2 (D) D k3 g3(h(FreeLoop.mp3))=k3 (E) k2
Basic Operations • putH(meta key k, File D) • Insert a file into the DHT • getH(meta key k) • Retrieve the file from the DHT • : { g(k , D) | g is used as a hash function} |H| : The replication level of the system Each file will be stored at |H| peers
Additional Problems • If the owner can modify the data … • The nature of P2P system • Peers can join and leave dynamically • Update while some peers depart and rejoins later? • Concurrent update?
Solution • If we have a timestamp for each transaction of update/insert ? • The currency of the file is judged by its timestamp • FileX = File + timestamp • Put (k, FileX) instead of (k, File) into the DHT!! • Then we know the freshness of the file • Only the latest update can succeed
How Can We Get A Timestamp? • KTS (Key-based Timestamp Service) • Issue timestamps for each transaction • gen_ts(key k) • Generate a timestamp w.r.t. key k • last_ts(key k) • Return the finally issued timestamp
The New DHT Functions • Based on the KTS service • Insert(key k, FileX D, Hash function set Hr) • Insert or update a file with identity key k into the DHT • Retrieve(k, Hr) • Retrieve the latest copy of the file with identity key k
Insert A File putg2(k, (tA, P.avi)) putg(k, (tA, P.avi)) gen_ts(k)=tA H G h(P.avi)=k KTS Timestamp Service U A k1 Insert P.avi B F g(k)=k1 (A) C E g2(k)=k2 (C) k2 D
Retrieve A File getg2(k) getg(k) last_ts(k)=tA H G h(P.avi)=k KTS Timestamp Service (t0, P.avi) U A k1 Get P.avi B F (tA, P.avi) g(k)=k1 (A) C E g2(k)=k2 (C) k2 D
Update A File • If( tsx > ts0) then • Update File D putg(k, (tsx, File D))
Retrieval Cost Analysis • C = Ckts + N * Cret • Ckts = Cret = O(logn), n = # of peers • Let X be the random variable of N • N : Number of retries to get the latest copy • pt : The probability of finding a fresh copy • Prob(X = i) = pt * (1 - pt)i-1 • |Hr| = number of replicas of the system
Retrieval Cost Analysis • Then, how can we get a timestamp? • Key-based Timestamp Service (KTS)
The KTS Service • Use the same DHT but with different hash function hts 4 3 TimeStamp Request (k) Req(k, hts)=p 1 Hash Table Req (k, hts) Hash Table Req(k, hts) 2
The KTS Service • How can node p generate timestamps w.r.t. key k? • Receive the counters from a leaving peer • DHT system will distribute the load of the leaving peer to its neighbors • Direct initialization • Send a file request w.r.t. key k to obtain the latest timestamp • Take place if the leaving peer fails • Indirect initialization
The KTS Service • Indirect initialization • The probability to fail pf • pf = (1-pt)|H| • If pt = 30%, |H|=13, then pf < 1% • After initialization, increase timestamp on every timestamp request
Experiments And Simulations • Environments • 64 node cluster • 10000 nodes on the SimJava platform • Metrics • Response time : Time to return a current replica in response to a query • Communication cost : # of messages to send to answer a query
The Competitor - BRICKS • Use a function to map key k to multiple keys (k1, k2, k3, k4, …) • Each replica has a version number • Concurrent update problems • Must extract all replicas to find the newest one
Conclusion • Pros • Use DHT to provide timestamp service is smart! • Consider the concurrent update problem • Easy to apply on exiting DHTs • Cons • KTS service can raise additional communication overhead