390 likes | 512 Views
Data Management in Peer-to-Peer Systems. Qi Sun Beverly Yang. Introduction. What is P2P? Distributed nodes Equal roles and functionality Providing/exchanging resources Why now? PCs are becoming valuable resources! Computing devices becoming pervasive. Many Applications .
E N D
Data Management in Peer-to-Peer Systems Qi Sun Beverly Yang
Introduction • What is P2P? • Distributed nodes • Equal roles and functionality • Providing/exchanging resources • Why now? • PCs are becoming valuable resources! • Computing devices becoming pervasive
Many Applications • Grid computing • e.g., Seti-at-Home • Ubiquitous computing • Cell phones, wireless devices, hand helds • Cars, refrigerators, microwaves • Preservation/Archival systems • File-sharing
File-sharing model • Data: (Title string, File blob) • Query: “Find songs by Madonna” • Result: • 63.274.18.3: Madonna – “Vogue” • 63.274.18.3: Madonna – “Beautiful Stranger” • 27.48.3.124: Madonna – “Like a Prayer” • 17.64.75.18: Madanna – “Vogue” • How is this “search” implemented?
Many Approaches • Napster • Gnutella • KaZaA • OverNet • BitTorrent
? C,E,F Server Napster • “Hybrid” P2P system A D E Index B F C Peers
Napster • Benefits • Efficient • Comprehensive • Can handle complex queries • Disadvantages • Server is single point of failure • Server is performance bottleneck • Server costs money to maintain!!!
Gnutella • “Pure” P2P system TCP “Overlay network”
= source = forward query = processed query = found result = forward response Gnutella
Gnutella • Benefits • No server needed (cost) • Robust (nodes can come and go) • Can handle complex queries per node • Disadvantages • Not comprehensive (can miss results) • Inefficient! (many messages)
Index Index Index KaZaA • “Super-peer” P2P system
Index Index Index ? Like Gnutella Like Napster KaZaA • “Super-peer” P2P system
KaZaA • Change the ratio of clients to super-peers • Napster: everyone (minus one) is a client • Gnutella: no one is a client • Combines strengths of hybrid and pure systems • Leverages heterogeneity of peers • e.g., bandwidth, memory, processing power • Napster: everyone (minus one) is a client • Gnutella: no one is a client
3561246 Hash(ABC) ABC ABC 7x106 – 8x106 Y 106 – 2x106 3x106 – 4x106 0 - 106 OverNet • Uses all peers to build a distributed index Z W . . . X . . .
OverNet: Searching • Given key k, which peer has the index? 4 2 8 1 Peer 0 looking for k=25 16 0 31 Distributed Hash Table (DHT) 25 24
Blk1 Blk2 Blk3 . . . Blk n BitTorrent • Downloading of a single file Tracker Peers 2, 3, 6
BitTorrent: Downloading • Tit-for-Tat strategy • Choking Mechanism • Periodic un-choke • Rare blocks first B: 3,5 A: 1,2,3,4 C: 2,3,4 B: 3 A: 1,2,3,4 C: 4
Challenges • Performance, Performance, Performance! • Find rare/popular files quickly • Minimize maintenance cost • Spread workload evenly • Etc. • Zillions of heuristics/variants
Challenges (2) • Participation: Peers are selfish! • Do not want to “donate” bandwidth • Do not want to share their files • Do not care about others • Need some incentive mechanism!!
Challenges (3) • Authenticity of data • How do you know you have the right file? • Bogus copies • Corrupt copies • Need detection/correction mechanisms
Techniques • Performance • Routing Indices • Network Awareness • Participation • SLIC • Micropayments • Correctness • DoS Prevention • Reputation Systems
? Routing Indices
DB 2,4 OS 2 AI 2,3,4 EE 3 1 DB 11,13 AI 11,12 AI 8,9 EE 10 DB 5 OS 5,6,7 2 3 4 EE AI DB Routing Indices (2) DB? 5 6 7 8 9 11 10 12 13 DB OS OS OS AI EE AI AI DB DB AI
Routing Indices (3) • Benefits • Potentially reduce # messages • Drawbacks • Update cost (any time you have state) • Size of index
File Y Reputation Systems I do! Who has file X? Bob Alice
? ? ? ? ? ? Reputation Systems Node 1 Node 2 • Have a “opinion list” • Base on personal experience? • Problem: sparse Node 3 Node 4 Node 5 Node 0 Node 6 Node 7 Node 8
Node 4 Node 1 Node 2 Node 6 Reputation Systems • Have a “trust list” • Base on personal experience? • Problem: sparse • Ask friends • Efficient • Automatic
Micropayments Micropayments • Only if you have money, will people do things for you! • Like a vending machine • Goods are cheap • Security can’t be too expensive
Scalability and performance bottleneck Micropayments $ • Server is needed… • Handle accounts • Distribute and cash coins • Security
Micropayments • Peers can do work too! • Challenge: SECURITY $
Fragment B B A Fragment A SLIC: Link-based Incentive • Use quality of service as incentive They need each other to reach more nodes. Þ Can retaliate
SLIC (2) B C D W(A,C) W(A,D) W(A,B) A Adjust weights, and use them to reward good neighbors and to penalize bad ones
Network Awareness • Overlay network can be poor! Timbuktu Mali, Africa San Francisco Palo Alto
Timbuktu Mali, Africa Palo Alto Network Awareness (2) • Form only “good” links • Probe a few and pick the best San Francisco
Timbuktu Mali, Africa Palo Alto Network Awareness (3) • “Swap” peers around San Francisco
Denial of Service • Malicious peers can flood queries on unstructured networks • Rate limit • Incentive • Micro-payment
Denial of Service • Malicious peers can drop queries and indices in structured networks • Tracing/Audit • Reorganization • Alternate path
Concluding Remarks • P2P provides a cheap infrastructure for leveraging the capacities of the masses. • P2P’s “openness” is both its strength and its weakness.