P2P Apps

P2P Apps CS525 Advanced Topics in Distributed Systems Spring 07 Presented By: Imranul Hoque, Sonia Jahid

P2P Apps • Applications that use P2P techniques for better performance OR • Applications that are built on P2P protocols • We’ll consider both • PAST: Built on top of Pastry • OverCite: Uses DHT instead of DB for storage • Colyseus: Uses range-queriable DHT for pub-sub

Storage management & caching in PAST Antony Rowstron, Peter Druschel

Background • P2P Storage Utility • Built on top of Pastry • Aim • Strong Persistence • k number of replicas • High Availability • Caching • Scalability • High utilization • Security • Quota, Key, Encrypted Routing Table

Operations • Insert • Parameters: name, owner-credentials, k, file • Returns: 160 bit identifier (fileId) • Lookup • Parameter: fileId • Returns: file from one of the (near) k nodes • Reclaim • Parameters: fileId, owner-credentials

Operations: Insert Insert • fileId = SHA-1(name, public-keyowner, salt) • storage = storage – k * filesize • fileCert = fileId, SHA-1(file), k, salt, date, owner • [file, fileCert, {fileCert}private-key] routed via Pastry • Before insertion, verify the certificate • Forward insert request to other (k-1) nodes • Once done each of the k replicasissues store receipt • Client gets the ack and does verification

Operations: Lookup Lookup • Client node issues request message • Pastry routes request message to a node containing file • Node replies with [file, fileCert, {fileCert}private-key] • Client verifies the file

Operations: Reclaim Reclaim • Client issues reclaim certificate • Reclaim certificate routed via Pastry • Replicas verify the reclaim certificate • Each storing nodes issues a reclaim receipt • A client receiving a reclaim receipt: • Verifies the receipt • storage = storage + filesize Reclaim vs. Delete

Storage Management • Goal • High global storage utilization • Graceful degradation at maximal utilization • Responsibility • Balancing free storage space • File copies are maintained by the k nodes with nodeIds closest to fileId • Conflicting Responsibility! • Solution: • Replica Diversion & File Diversion

Replica Diversion • Purpose: Balance remaining free storage among the nodes in a leaf set • Policy: • if (fileSize/freeSpacepri > tpri) • { • Choose a node div from the leaf set: • fileSize/freeSpacediv < tdiv • Not among the k closest nodes of fileId • Do not hold a replica of the same file • freeSpacediv is maximum among all such nodes • If no such node then: • Nodes already stored the replica discard them • Send a NACK to client causing File Diversion • }

Replica Diversion (2) File A 2 3 1 4 Leaf Set of A

Replica Diversion (3) A 2 File 3 1 A stores a pointer to 4 4 Leaf Set of A

Replica Diversion (4) C A 2 File 3 1 A stores a pointer to 4 4 Leaf Set of A A inserts a pointer in C C = (k+1)th closest node

File Diversion • Purpose: Balance remaining free storage space among different portions of nodeId space • Method: • Generate fileId with different salt • Retry insert operation • Repeat process for three times • If third attempt fails, report insert failure

Maintaining Replicas • Nodes exchange keep-alive messages to keep track of failed nodes • Unresponsive nodes replaced by new entries • New joining nodes also cause change in leaf set • If a joining node becomes one of the kthclosest of a node N • The joining node keeps a pointer to the file table of N • Affected files are gradually migrated in background • A node discovering that a diverted replica is not a part of its leaf set • Gradually migrates files to a node within the node’s leaf set

Maintaining Replicas (2) • Node failure may cause storage shortage • Remaining nodes in leaf set can’t store files • Contact 2 most distant members of the leaf set to locate space • Fail otherwise • Open Pastry and Proprietary Pastry differ in replica management scheme

Optimization • File Encoding • Idea 1: Don’t store k replicas, add m checksum blocks (Pros and Cons?) • Idea 2: Store fragments of file at separate nodes (Pros and Cons?) • File Caching • Idea: Use unused portion of advertised disk space to cache files • Cache copies can be discarded or evicted at any time

Experimental Setup • 2250 PAST Nodes • Number of replicas, k = 5 • For Pastry, b = 4 • Two different workloads: • Proxy logs from NLANR • File name and size from several file systems at Microsoft

Experimental Results • tdiv = 0.05 • Lower tpri = higher success rate • Lower tpri = lower utilization • Why? • tpri = 0.1 • tdiv = 0.05

Experimental Results (2) File diversions are negligible as long as storage utilization below 83%

Experimental Results (3) Even at 80% utilization less than 10% are diverted replicas

Experimental Results (4) • At low storage utilization Hit Rate is high • # of hop increases as utilization increases

Discussion • CFS: Built on top of Chord • CFS stores blocks rather than whole files • CFS relies on caching for small files only • Ivy: Read/Write P2P file system • Based on a set of logs and DHash • Provides an NFS like file system view to the user • Uses version vectors for synchronization • OceanStore: Uses un-trusted servers • Data is encrypted • Uses ACL for write access, key for read access • Data migrated based on access patterns

OverCiteA Distributed, Cooperative CiteSeer Jeremy Stribling, Jinyang Li, Isaac G. Councill, M. Frans Kaashoek, Robert Morris

Motivation • CiteSeer allows users to search and browse a large archive of research papers • Centralized design: Crawler, Database, Indexer • OverCite aggregates donated resources at multiple cites • Challenge: • Load balancing storage • Query processing with automatic data management

Contribution • A 3-tier DHT-backed design • OverCite • Experimental evaluation with 27 nodes, full CiteSeer document, trace of CiteSeer user queries

Architecture Web server Web server Tier1 Keyword Search Keyword Search Crawler Crawler Tier2 Local index file Local index file DHT storage for docs & meta data Tier3

Life of a Query Front End Index Server DHT • Front End (FE) chosen via Round Robin DNS • FE sends query to k Index Servers • Index Servers contact DHT for metadata • Result forwarded to FE, FE aggregates

Global Data Structure • Document ID (DID) for each document for which a PDF/PS file is found • Citation ID (CID) to every bib entry in a document • Group ID (GID) for use in contexts where a file is not required

Global Data Structure (2) If exists DID -> 110 GID -> 150 FID -> 425 CID -> 231 GID ->118

Local Data Structures • Required for keyword searches • Local data includes Inverted Index Table • OverCite uses k index partitions • Total number of nodes = n • n > k • n/k copies of each index partition • Large k vs. Small k

Local Data Structures (2) Partition 0 Partition 1 Crawler Partition 2 File ID = 124 124%3 = 1 DHT

Web Crawler • Builds on several existing proposals for distributed crawling • Crawler performs a lookup for each new PDF/PS link in the URLs table • Download, parse, extract metadata • Check for duplicate • FID in Files • Title in Titles • Shingle in Shins • Update Files, Docs, Cites, Groups, Titles

Implementation • Current implementation does not include Crawler module • Populated with existing CiteSeer docs • Indexes the first 5000 words of each doc • OK Web Server • DHash DHT

Performance • 27 nodes: 16 at MIT, 11 over N. America • 47 physical disks (each 35 ~ 400 GB) • Inserted 674,720 original copies of documents from CiteSeer repository • k =2 , m = 20 • Each node has a complete on-disk cache of the text files for all documents in its index partition

Performance (2) Query Throughput • Total nodes in each configuration is twice the front ends • A client at MIT keeps 128 queries active File Download • Client requests 128 files concurrently

Performance (3) • Adding n nodes would decrease per node storage costs by a factor of roughly n/4 Centralized server OverCite deployment

Discussion • Future Works: • Detecting plagiarism • Automatic alert system • Shallow paper • Google Scholar (http://scholar.google.com) • Search query: Impossibility of Consensus • Google Scholar: Top result • CiteSeer: Not even in top 20 • OverCite: Not in the top 20 • Search query: Chord • Google Scholar: Top result • CiteSeer: Not in top 20 • OverCite: Top result

ColyseusA Distributed Architecture for Online Multiplier Games Ashwin Bharambe, Jeffrey Pang, Srinivasan Seshan

Background & Motivation • Contemporary Game Design: • Client-Server architecture • e.g., Quake, World of Warcraft, Final Fantasy, etc. • Problems? • Single server: a computation & communication bottleneck • Optimization? • Area of interest filtering • Delta encoding

Background & Motivation • Quake II server running on PIII 1GHz machine, 512MB RAM • Each player simulated with server side AI bot • AOI filtering & DE implemented • Each game run for 10 mins at 10 frames/sec

Colyseus Architecture • Challenges: • Arrive at a scalable & efficient state & logic partitioning that enables reasonably consistent, low latency game play • Objects: • Immutable- map geometry, graphics, etc. • Updated infrequently, so globally replicated • Mutable- players’ avatars, doors, etc. • Updated frequently

Colyseus Architecture (2) Node 1 Node 2 Game application Game application P P Object Placer Object Placer R R P R Replica Manager Replica Manager Local Object Store Local Object Store Object Locator Object Locator Colyseus Components

Colyseus Architecture: Object Location • Subscription: Range queries describing area-of-interests are sent and stored in DHT • Publication: Other objects periodically publish metadata (e.g., their x, y, z coordinates) in the DHT • Challenge? • Overcome the delay between submission of subscription & reception of matching publication

Range Queriable DHT: Overview 8 here 7 here 8 here 7 here 6 here 6 stored here Range-queriable DHT Traditional DHT

Optimization • Pre-fetching: • Primary objects predict the set of objects they may need in near future • Colyseus pre-fetches them (area-of-interest) • Pro-active replication: • Allow short-lived objects to attach themselves to others • Soft state storage: • Object locator stores both publication-subscription • TTL added pub-sub • Publish objects at different rates

Experimental Results: Communication Cost • p2p rect configuration • Workload keeps mean player density constant by increasing map size • At very small scale object location overhead is very high • Per node bandwidth rises very slowly in Colyseus

Discussion • Colyseus enables low latency game play through optimizations • Range-queriable DHT achieves better scalability & load balance than traditional DHT (Consistency penalty?) • Security?

P2P Apps

P2P Apps

Presentation Transcript

P2P

P2P Apps (II)

: : : :P2P : : : :

P2P P2P 2005

P2P Networks

P2P Databases

P2P Apps

P2P Computing

P2P-VoD

P2P

MANETs, P2P, and P2P MANET Overlays

Network – P2P

P2P Apps

P2P

P2P P2P 2005

P2P