PeerDB: A P2P-based System for Distributed Data Sharing

PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03

Overview • A P2P “database” system • Allows content-based search • No global schema • Utilizes mobile agents • Provides flexibility and extensibility • Dynamically adjusts topology Shawn Jeffery PeerDB

Background: P2P vs Distributed Databases Shawn Jeffery PeerDB

BestPeer • Generic P2P platform • Mobile Agents • Carry code and data • Collect stats • Security issues? • Dynamic Reconfiguration • How does this compare to Gia? • Location Independent Global Names Lookup (LIGLO) Servers • Small number • Provides a global identity for peers and peer status • Why not use a DHT/KBR/DOLR? Shawn Jeffery PeerDB

BestPeer Security • Private and sharable data • Agents only able to access sharable data • Does this adequately restrict the power of mobile agents? • Communications on the wire also encrypted • What’s missing? Shawn Jeffery PeerDB

Architecture Sharable Data Local Data Shawn Jeffery PeerDB Database

Schema “Mediation” • Problems with supporting SQL queries: • No global schema information • Different nodes could name the same table/attribute differently (“len”, “length”) • Solution: User supplies metadata for each relation name and attribute • Users expected to do a lot • Formula based on matching relation keywords and attribute keywords to determine if a query matches a table • What about other schema mediation work (such as Piazza)? Shawn Jeffery PeerDB

Local Query Processing – Phase I • “Master Agent” coordinates the entire affair • Check Local Dictionary for matching relations • Use the relation matching strategy even for the local DB • Create “Relation Matching Agents” and flood to all neighbors • Wait for responses • Display results to user as they arrive Shawn Jeffery PeerDB

Local Query Processing – Phase II • User selects the relations he/she wants • Create a “Data Retrieval Agent” • Rewrite query in terms of new relations • If local, submit SQL to local db • Contact remote nodes directly to access the data • Creates remote join plans locally - optimization? Shawn Jeffery PeerDB

Remote Query Processing • Phase I: Find relations • Relation Matching Agents flood with TTL • Check Export Dictionary for a match • Return matches directly • Phase II: Get data • Data Retrieval Agent submits SQL to DBMS • Return data to the requesting node directly • Run further data processing before returning • Again, security issues Shawn Jeffery PeerDB

Statistics • Master Agents monitor stats in the network • Keywords for some relations returned during Phase I • Update metadata • Number of objects returned for selected relations • Can be used for topology change decisions • Use most recently returned results as metric to determine who to connect with • Frequent updates – might need to change neighbors after each result returned Shawn Jeffery PeerDB

Caches • Cache all query results locally • Soft state • LRU replacement • Users choose which copy they want • Only provided with peer id and an indication of which is the source • What about timestamp, etc? • Again, user heavily involved Shawn Jeffery PeerDB

Relation Matching Performance • Significant tradeoff between precision and recall • Which is more important? • Is their approach acceptable? Shawn Jeffery PeerDB

Experimental Methodology • Compare P2P Model vs Client/Server model • CS returns via the search path (?) • Compare static vs reconfigurable networks • Compare agent vs message based approach • 32 Nodes • Is this enough? Shawn Jeffery PeerDB

Evaluation Scenarios (Metrics?) • Fixed set of nodes • Easily test P2P protocols, Reconfiguration strategies • Latency • Quality and Quantity • What else is important? Shawn Jeffery PeerDB

Performance • As you increase the amount of storage on each node, latency decrease • Due to caching • In general, reconfiguration performs better • Response times O(1 Minute) • Is this acceptable? • Agent based shown to be better • What if agent produces more data than it processes? Shawn Jeffery PeerDB

Discussion: A P2P DBMS? • PeerDB represents a tiny step towards a P2P DB (also PIER, Piazza) • What does it do right? • What else is needed? • Is it ideal to have a P2P DB? • Is it feasible? Shawn Jeffery PeerDB

PeerDB: A P2P-based System for Distributed Data Sharing