P2P Apps

P2P Apps Chandrasekar Ramachandran and Rahul Malik Papers: 1.Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility 2. Colyseus: A distributed architecture for interactive multiplayer games 3. OverCite: A Distributed, Cooperative CiteSeer CS525 02/19/2008

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron Peter Druschel Microsoft Research Rice University

Contents • Introduction • Background • An Overview of PAST • Pastry • Operations • Improvements • Storage Management • Caching • Experimental Evaluation • Setup • Results • Conclusions

Introduction – Focus and Common Themes • Recent Focus: • Decentralized Control • Self-Organization • Adaptability/Scalability • P2P Utility Systems • Large-Scale • Common Themes in P2P Systems : • Symmetric Communication • Nearly-Identical Capabilities Source:1

Background • Characteristic Features of Internet: • Geography • Ownership • Administration • Jurisdiction • Need for Strong Persistence and High Availability • Obviates: • Physical Transport of Storage Data • Mirroring • Sharing of Storage and Bandwidth Source:2

An Overview of PAST • Any host connected to Internet can be a PAST Node • Overlay Network • PAST Node: Access Point for User • Operations Exported to Clients: • Insert • Lookup • Reclaim • Terms: NodeId, FileID • NodeId: 128-bit, SHA-1 Hash of Node’s Public Key Source:3

Pastry – Overview and Routing Table • P2P Routing Substrate • Given Message: • Routes to NodeID with FileId Closest to 128 msb • In < log2bN Steps • Eventual Delivery Guaranteed • Routing Table: • log2bNlevels with 2b − 1 entries • Each entry • NodeId  appropriate prefix • Leaf Set and Neighborhood Set

Basic PAST Operations • Insert • Store File on K-PAST Nodes with NodeId closest to 128 msb of FileId • Balance Storage Utilization • Uniform Distribution of Set of NodeIds and FileIDs • Storage Quota  Debited • Store Receipts • Routing Via Pastry • Lookup • Nodes Respond with Content and Stored File Certificate • Data Usually Found Near Client. Why? • Proximity Metric • Reclaim • Reclaim Certificate • Reclaim Receipt

PAST - Security • Smartcards • Private/Public Key Pair • Certificates • Storage Quotas • Assumptions • Computationally Infeasible to Crack Cryptographic Functions • Most Nodes Well Behaved • Attacker impervious to Smartcards • Features • Integrity Maintained • Store Receipts • Randomized Pastry Routing Scheme • Routing Information Redundant Source4 Ingenious?

Storage Management - Overview • Aims: • High Global Storage Utilization • Graceful Degradation with Max Utilization • Rely on Local Coordination • Why is Storage Not Always uniform? • Statistical Variations • Size Distribution of Files • Different Storage Capacities • How much can a node store? • Difference: No more than Order 2 Mag • Compare Advertised Storage Capacity with Leaf Set • Use Cheap Hardware with 60 GB Avg • Node Large? • Split it

Storage Management - Replication • Replica Diversion • Purpose?: Balance remaining free storage • Store Success?: • Forward to k-1 nodes • Store Receipt • Store Fail? • Choose B in Non-Leaf Set • B: Stores, A:Pointer to B • Replacement Replicas • Policies • Acceptance of Replicas Locally • Selection of Replica Nodes • Decisions, Decisions… • File Diversion • Balance Free Storage in NodeId Space • Retry Insert Operation

Storage Management – Maintenance • Maintenance • K Copies of Inserted File • Leaf Set • Failures?: • Keep-Alive Messages • Adjustments in Leaf-sets • Nodes : “Please Give Me Replicas of All Files!” • Not Possible • Time-Consuming and Inefficient • Solutions: • Use Pointers to FileIds • Assumption: • Total Amount of Storage in the System Never Decreases

Caching • Goal: • Minimize Client Access Latencies • Balance Query Load • Maximize Query Throughput • Creating Additional Replicas • Where do you Cache? • Use unused disk space • Evict Cached Copies when necessary • Insert into Cache If: • Size less than Fraction C of Node Storage Capacity • Greedy-Dual Size Policy

Performance Evaluation • Implemented in Java • Configured to Run in Single Java VM • Hardware • Compaq AlphaServer ES40 • True64 Unix • 6 GB Main Memory • Data • 8 Web Proxy Logs from NLANR • 4 Million Entries • 18.7 GB Content • Institutional File Systems • 2 Million files • 167 GB

Results SD/FN > tdiv • Storage • Number of files Stored increases with: • Lower tpri • Storage Utilization Drops • Higher Rate of Insertion Failure • Number of Diverted Replicas Small at High Utilization • Caching • Global Cache Hit Ratio Decreases as Storage Utilization Increases when tpri = 0.1 and tdiv = 0.05.

References • Images: • bahaiviews.blogspot.com/2006_02_01_archive.html • http://images.jupiterimages.com/common/detail/21/05/22480521.jpg • http://www.masternewmedia.org/news/images/p2p_swarming.jpg • http://www.theage.com.au/news/national/smart-card-back-on-the-agenda/2006/03/26/1143330931688.html

Discussion • Comparison with CFS and Ivy • How can External Factors such as globally known information help in Local Coordination?

Colyseus : A Distributed Architecture for Online Multiplayer Games Ashwin Bharambe, Jeffrey Pang, Srini SeshanACM/USENIX NSDI 2006

Networked games are rapidly evolving www.mmogchart.com

Quake II server Bandwidth (kbps) Centralized Scheme Slow paced games with less interaction between server and client may scale well • Not true of FPS games (e.g. Quake): • Demand high interactivity • Need a single game world • High outgoing traffic at server • Common shared state between clients

Immutable State Interactive 3-D environment (maps, models, textures) Game Model Think function Mutable State Ammo Monsters Game Status Screenshot of Serious Sam Player

Distributed Architecture • Create the replicas • Discovery of objects Object Replica

Replication • Each object has a primary copy that resides on exactly one node • Primary executes think function for the object • Replicas are read-only • Replicas are serialized at primary.

Matching Object Location Publication Subscription Find objects in range [x1,x2], [y1,y2], [z1,z2] My location is (x,y,z) Challenge: Overcome the delay between Subscription and reception of Matching publication

x = 1 hash 0xb2 Distributed Hash Tables (DHT) 0xf0 0xe0 0x00 0xd0 0x10 0xc0 0xb0 0x20 0xa0 0x30 Finger pointer 0x90 0x40 0x80 O(log n) hops 0x50 0x60 0x70

Using DHTs for Range Queries Query: 6  x  13 key = 6 0xab key = 7 0xd3 … key = 13 0x12 0xf0 0xe0 • No cryptographic hashing for key  identifier 0x00 0xd0 0x10 0xc0 Query: 6  x  13 0xb0 0x20 0xa0 0x30 0x90 0x40 0x50 0x80 0x60 0x70

Using DHTs for Range Queries • Nodes in popular regions can be overloaded • Load imbalance!

DHTs with Load Balancing • Load balancing strategy • Re-adjust responsibilities • Range ownerships are skewed!

DHTs with Load Balancing 0xf0 0xe0 0xd0 0x00 Popular Region 0xb0 Finger pointers get skewed! 0x30 0xa0 0x90 • Each routing hop may not reduce node-space by half! •  no log(n) hop guarantee 0x80

Ideal Link Structure 0xf0 0xe0 0xd0 0x00 Popular Region 0xb0 0x30 0xa0 0x90 0x80

Need to establish links based on node-distance Values v4 v8 4 8 Nodes • If we had the above information… • For finger i • Estimate value v for which 2i th node is responsible

Node-density Values Histogram Maintenance 0xf0 • Measure node-density locally • Gossip about it! 0xe0 0xd0 0x00 (Range, density) (Range, density) (Range, density) 0xb0 Request sample 0x30 0xa0 0x90 0x80 0x70

Basic idea – leave-join “light” nodes leave Re-join near “heavy” nodes; split the range of the heavier node Load Balancing Load histogram Load 0 10 20 25 35 45 60 65 70 75 85

Prefetching • On-demand object discovery can cause stalls or render an incorrect view • So, use game physics for prediction • Predict which areas to move to and subscribe objects from those areas

Proactive Replication • Standard object discovery and replica instantiation slow for short-lived objects • Uses observation that most objects originate close to creator • Piggyback object-creation messages to updates of other objects

Soft State Storage • Objects need to tailor publication rate to speed • Ammo or health-packs don’t move much • Add TTLs to subscriptions and publications • Stored at the rendezvous node(s) Pubs act like triggers to incoming subs

Experimental Setup • Emulab-based evaluation • Synthetic game • Workload based on Quake III traces • P2P scenario • 1 player per server • Unlimited bandwidth • Modeled end-to-end latencies • More results including a Quake II evaluation, in the paper

Mean outgoing bandwidth (kbps) Number of nodes Evaluation Per-node Bandwidth Scaling

View Inconsistency

Discussion • Bandwidth costs scale well with number of nodes • As compared to single server model, more feasible for P2P deployment • However, overall bandwidth costs are 4-5 higher. So, there is overhead. • View inconsistency is small and gets quickly repaired

Discussion Questions • Avenue for cheating: • Nodes can modify objects in local storage • Nodes can withhold publications • Nodes can subscribe to regions of world they should not see • How scalable is the architecture? • Feasibility in real world

OverCite: A Distributed, Cooperative CiteSeer Jeremy Stribling, Jinyang Li, Isaac G. Councill, M. Frans Kaashoek, and Robert Morris

Contents • Introduction • Characteristics of CiteSeer • Problems and Possible Solutions • Structure of OverCite • Experimental Evaluation

Introduction • What is CiteSeer? • Online Repository of Papers • Crawls, Indexes, Links, Ranks Papers • Periodically updates its Index with Newly Discovered Documents • Stores Several Meta-Data Tables to: • Identify Tables • Filter Out Duplicates • Overcite: • CiteSeer Like System • Provides: • Scalable and Load-Balanced Storage • Automatic Data Management • Efficient Query Processing

Characteristics of CiteSeer - Problems • 35 GB Network Traffic Per Day • 1 TB of Disk Storage • Significant Human Maintenance • Coordinating Crawling Activities Across All Sites • Reducing Inter-Site Communication • Parallelizing Storage to Minimize Per-Site Burden • Tolerating Network and Site Failures • Adding New Resources Difficult

Possible Solutions • Mentioned Solutions: • Donate Resources • Run your Own Mirrors • Partitioning Network • Use Content Distribution Networks

Structure of OverCite • 3-Tier DHT Backed Design • Web-based Front End • Application Server • DHT Back-End • Multi-Site Deployment of CiteSeer • Indexed Keyword Search • Parallelized Similar to Cluster based Search Engines

Features of Search and Crawl • Crawling: • Coordinate via DHT • Searching: • Divide Docs into Partitions, hosts into Groups • Less Search Work per Host

OverCite: DHT Storage and Partitioning • Stores Papers For Durability • Meta Data Tables As: • Document ID  {Title etc} • Partitioning: • By Document • Dividing Index into K Partitions • Each Query: • Sent to K - Nodes

OverCite Implementation and Deployment • Storage: Chord/D Hash/DHT • Index: Search Search Engine • Web Server: OKWS • Deployment: • 27 Nodes Across North America • 9 RON/IRIS nodes and Private Machines • 47 Physical Disk and 3 DHash nodes per disk

P2P Apps

P2P Apps

Presentation Transcript

P2P

P2P Apps (II)

: : : :P2P : : : :

P2P P2P 2005

P2P Apps

P2P Networks

P2P Databases

P2P Apps

P2P Computing

P2P-VoD

P2P

MANETs, P2P, and P2P MANET Overlays

Network – P2P

P2P

P2P P2P 2005

P2P