1 / 17

PeerDB: A P2P-based System for Distributed Data Sharing

PeerDB: A P2P-based System for Distributed Data Sharing. Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou. Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03. Overview. A P2P “database” system Allows content-based search No global schema Utilizes mobile agents

lynchc
Download Presentation

PeerDB: A P2P-based System for Distributed Data Sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03

  2. Overview • A P2P “database” system • Allows content-based search • No global schema • Utilizes mobile agents • Provides flexibility and extensibility • Dynamically adjusts topology Shawn Jeffery PeerDB

  3. Background: P2P vs Distributed Databases Shawn Jeffery PeerDB

  4. BestPeer • Generic P2P platform • Mobile Agents • Carry code and data • Collect stats • Security issues? • Dynamic Reconfiguration • How does this compare to Gia? • Location Independent Global Names Lookup (LIGLO) Servers • Small number • Provides a global identity for peers and peer status • Why not use a DHT/KBR/DOLR? Shawn Jeffery PeerDB

  5. BestPeer Security • Private and sharable data • Agents only able to access sharable data • Does this adequately restrict the power of mobile agents? • Communications on the wire also encrypted • What’s missing? Shawn Jeffery PeerDB

  6. Architecture Sharable Data Local Data Shawn Jeffery PeerDB Database

  7. Schema “Mediation” • Problems with supporting SQL queries: • No global schema information • Different nodes could name the same table/attribute differently (“len”, “length”) • Solution: User supplies metadata for each relation name and attribute • Users expected to do a lot • Formula based on matching relation keywords and attribute keywords to determine if a query matches a table • What about other schema mediation work (such as Piazza)? Shawn Jeffery PeerDB

  8. Local Query Processing – Phase I • “Master Agent” coordinates the entire affair • Check Local Dictionary for matching relations • Use the relation matching strategy even for the local DB • Create “Relation Matching Agents” and flood to all neighbors • Wait for responses • Display results to user as they arrive Shawn Jeffery PeerDB

  9. Local Query Processing – Phase II • User selects the relations he/she wants • Create a “Data Retrieval Agent” • Rewrite query in terms of new relations • If local, submit SQL to local db • Contact remote nodes directly to access the data • Creates remote join plans locally - optimization? Shawn Jeffery PeerDB

  10. Remote Query Processing • Phase I: Find relations • Relation Matching Agents flood with TTL • Check Export Dictionary for a match • Return matches directly • Phase II: Get data • Data Retrieval Agent submits SQL to DBMS • Return data to the requesting node directly • Run further data processing before returning • Again, security issues Shawn Jeffery PeerDB

  11. Statistics • Master Agents monitor stats in the network • Keywords for some relations returned during Phase I • Update metadata • Number of objects returned for selected relations • Can be used for topology change decisions • Use most recently returned results as metric to determine who to connect with • Frequent updates – might need to change neighbors after each result returned Shawn Jeffery PeerDB

  12. Caches • Cache all query results locally • Soft state • LRU replacement • Users choose which copy they want • Only provided with peer id and an indication of which is the source • What about timestamp, etc? • Again, user heavily involved Shawn Jeffery PeerDB

  13. Relation Matching Performance • Significant tradeoff between precision and recall • Which is more important? • Is their approach acceptable? Shawn Jeffery PeerDB

  14. Experimental Methodology • Compare P2P Model vs Client/Server model • CS returns via the search path (?) • Compare static vs reconfigurable networks • Compare agent vs message based approach • 32 Nodes • Is this enough? Shawn Jeffery PeerDB

  15. Evaluation Scenarios (Metrics?) • Fixed set of nodes • Easily test P2P protocols, Reconfiguration strategies • Latency • Quality and Quantity • What else is important? Shawn Jeffery PeerDB

  16. Performance • As you increase the amount of storage on each node, latency decrease • Due to caching • In general, reconfiguration performs better • Response times O(1 Minute) • Is this acceptable? • Agent based shown to be better • What if agent produces more data than it processes? Shawn Jeffery PeerDB

  17. Discussion: A P2P DBMS? • PeerDB represents a tiny step towards a P2P DB (also PIER, Piazza) • What does it do right? • What else is needed? • Is it ideal to have a P2P DB? • Is it feasible? Shawn Jeffery PeerDB

More Related