Maintaining Replicas in Unstructured P2P Systems

Maintaining Replicas in Unstructured P2P Systems CoNEXT, Madrid, 12/11/2008 Christof Leng, TU Darmstadt Wesley W. Terpstra, TU Darmstadt Bettina Kemme, McGill University Wilhelm Stannat, TU Darmstadt Alejandro P. Buchmann, TU Darmstadt http://www.bubblestorm.net http://www.dvs.tu-darmstadt.de Databases and Distributed Systems

Replicate, what? 2

Replicas far and small • Our research focuses on peer-to-peer search • The data we replicate is usually small (< 10kB) • Modern unstructured overlays (Sarshar’04, Ferreira’05, BubbleStorm) can have several hundred copies of an object • Why replicate so far? • The more copies, the easier it is to find one • The more providers, the harder to overload them • When a node leaves, its copy is gone with the wind To be clear: this talk is not about replicating files 3

Two kinds of replicated data Maintained • “I’m online @132.160.222.1” • “Tell me when a paper is published with ‘P2P’&‘search’” • “I can provide these files” • “I am waiting for event X” • Service Lists • Subscriptions Collective • Wiki articles • Information about physical objects outside the network • Distributed file systems • System backups • “Persistent” information 4

Who maintains replicated data? A Maintainer • Ideal for data that should never outlive its owner • The owner can manage it • If the owner crashes, any remaining replicas are junk The Collective • Ideal for data that should live until explicitly deleted • No clear managing authority • Replicas of undeleted objects should remain in the system 5

Our Paper: Maintainer-based “Let there be replicas!” 6

Maintaining Replicas • We want to be able to • Ensure the system has no junk replicas (of objects whose maintainer has left) – they cause bad search results! • Ensure that there are exactly the number of replicas requested • We need to be able to • Keep junk contained so the system remains useful • Increase/decrease the density of replicas in the system • Hold the density of replicas steady against network churn • Update or destroy all the replicas of an object ...you can’t always get what you want… 7

An INformal model 8

The Good (aka assumptions) • Nodes all run our software – so they mostly cooperate • Computing a sum over participants in the network is easy …and since we do unstructured peer-to-peer: • We don’t care too much about who has which particular replica 9

The Bad (aka Reality) • Churn is out of our control • We can’t stop it • Its rate changes over time • We cannot influence participant lifetimes • Nodes sometimes crash • They don’t say good-bye; they just leave • This happens a lot 10

The Ugly • Storage Providing Peers crash silently • fixed replication is impossible • Guarantee probability distribution? • sufficient to prove the correctness of Stochastic algorithms • Maintainers crashes are silent • replicas should have been deleted • Zero junk is impossible • Perhaps guarantee below a threshold? • sufficient to prove the performance of Stochastic algorithms 11

Don’t try this at home 12

A common maintenance strategy Maintainer: • Push desired replicas into system • Wait for X minutes • Goto 1 Storage Providing Peer: • After Y minutes, replica is deleted 13

Why this is bad • What should the parameters X and Y be? • If X is too slow: • If Y is too slow: • If X or Y are too fast, more traffic is expended than necessary • Problem: Churn rate is out of our control (unboundable) • Correctness requires setting X for the worst possible situation (costly) 14

Pull on Join 15

Observation: Density • When a storage providing peer leaves the system • Replica count might be reduced • Expected replica density remains the same • Holding replica density fixed dodges the problem of crashes • Not just a semantic difference! • Must compensate by replicating whenever peers join • We can adjust the density as the population changes • It is still possible to hold replicas at a fixed value (or √n) 16

Our replication algorithm • Maintainer: • Push out initial replicas • Record who has received replicas (superset) • Push extra replicas to increase density • Probabilistically delete existing replicas to decrease density • Storage Providing Peer: • On join, ask randomly selected maintainers for replicas • Accept replicas pushed out from maintainers 17

Visual Example … “Hello!” “I don’t need so many.” “Give me more replicas!” “Chow!” “Bye Bye!” 18

Convergence to the Binomial 19

For the whole network 20

Flushing Junk 21

Observation: Pulls have no Junk • Recall: Junk is bad • Unnecessarily consumes storage • Results in spurious query results • When a node joins, all the replicas it receives are valid • So, to control junk… • Blow everything away • Reload fresh replicas 22

But the cost?! • Sounds expensive • Not all storage providing peers (c)rash, only c % • We require only g % of replicas to be (g)ood • If c=10% and g=80%, the overhead of flushing is • ½ c/(1/g – 1) = 20% extra replica transfers • only applies to especially long-lived storage peers • It might be possible to optimize this cost away. We don’t. • Be careful! Most of the obvious optimizations are wrong • It’s easy to introduce statistical defects that accumulate over time 23

When to flush? • Expected number of replicas stored is easy to compute (a sum) • Flush when: stored replicas > expected replicas + tolerable junk • Only one problem: • Peers that happen to store more than average flush earlier • those peers are preferentially destroyed • there are less replicas than there should be 24

Independence is needed • P(v stores o | v flushes) = P(v stores o) • This equality fails if a node flushes because it is full • Solution: • Use the flow of replicas through one bucket to flush the other • The buckets’ replica flows are statistically independent: done! 25

Distribution of Junk 26

Things you’ll find in the paper • How pull on join handles different replication densities • The details of the flush threshold equation • Formal Stochastic proof of correctness • How to support peers with different storage capacities • How to support maintainers behind NAT/firewalls • Cost (in operations) or the proposed algorithms • Simulations of what-if scenarios 27

Conclusion • Providing replication guarantees in peer-to-peer is feasible • Strong guarantees are impossible • Probabilistic guarantees are tricky • Seemingly innocent choices result in statistically bad behaviour • Maintainer-based replication is a relevant sub-problem • Required for service lists and query subscriptions • Compared to collective replication, maintainer-based • requires junk control • has an obvious node (maintainer/owner) to manage replication 28

Thanks for listening! ? Questions http://www.bubblestorm.net http://www.dvs.tu-darmstadt.de 29

Maintaining Replicas in Unstructured P2P Systems

Maintaining Replicas in Unstructured P2P Systems

Presentation Transcript

Unstructured P2P Networks

Managing Unstructured Data in Healthcare Systems

Trust Management in P2P systems

Improve search in unstructured P2P overlay

Unstructured P2P overlay

Adaptive Trust Aware Community in Unstructured P2P Network

On Improving the Performance Dependability of Unstructured P2P Systems via Replication

Data types in P2P systems

SPAM DETECTION IN P2P SYSTEMS

SPAM DETECTION IN P2P SYSTEMS

SPAM DETECTION IN P2P SYSTEMS

LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems

P2P Database Systems

Search and Replication in Unstructured P2P Networks

8. Trust in P2P Systems

Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

Network Coding in P2P-Systems

Can Unstructured P2P Protocols survive Flash Crowds

Unstructured P2P Networks

Trust Management in P2P systems

Security Issues in P2P Systems