210 likes | 375 Views
Peer-to-Peer Databases. David Andersen Advanced Databases. What is Peer-to-Peer?. Shared Resources Each peer is a shares its resources with others, acting as both a client and server. Decentralization and Self-organization
E N D
Peer-to-Peer Databases David Andersen Advanced Databases
What is Peer-to-Peer? • Shared Resources Each peer is a shares its resources with others, acting as both a client and server. • Decentralization and Self-organization Peers coordinate their activities with other peers rather than with a centralized server. • Autonomy Peers are free to come and go at will.
Napster • Hybrid P2P • Data stored on peers, but a central server maintained index of file location. • File sharing - not a DBMS system.
Gnutella • True P2P - Peer need only know one other peer to join. The Gnutella Protocol
Gnutella • Uses Flooding Queries hop from peer-to-peer. A TTL (time-to-live) sent with the query prevents eternal searching. • Very High Bandwidth Usage. • File Sharing – Not DBMS
P2P and Databases • Advantages • No Bottlenecks • Vast Resources Available • Improved Scalability • Improved Robustness • Less Management • Access to a tremendous amount of data
P2P and Databases • Challenges • Coordinating Semantics • Query Processing Efficiency • Topology/Bandwidth Considerations • Indexing • Replication • Performing Updates and Avoiding Stale Data • Security - Access Control and Peer Reputation
Case Study – Hyperion Project • Peers have a own local DBMS. • PeerDBMS layer augments the local DBMS to support peer-to-peer functionality. • Peers can form acquaintances. • Metadata is exchanged and the semantics of the peer acquaintance is mapped on the local system. • Uses Pair-wise Mappings to resolve queries.
The Hyperion PDBMS • Query Service • Handles Local Queries • Uses Mapping Tables to Rewrite or Translate Queries destined for Remote Databases • Peer Coordination Service • Manages and Executes Updates • Uses Event-Condition-Action Rules
The Hyperion PDBMS • P2P User Interface • Local and Peer Queries are posed through the interface • User is unaware of differing semantics at the peer • Peer Manager Messaging system to communicate with peers • Acquaintance Manager Manages exchange of schemas, mapping tables, and rules for updating data
Hyperion Mapping Tables Table from Airline ‘A’ Table from Airline ‘B’ Mapping Tables
Case Study – The Piazza Project Project Goals • Focus on developing query reformulation algorithms • Assist in defining mappings • Indexing • Enforcing access control
Piazza Schema Mappings • Two types of mappings • Peer Description Relates two or more peer schemas Example: DBProjects:Member(pName, member) = UW:Member(mid, pid, member), UW:Project(pid, pName) • Storage Description Relates data stored in at a peer into peer’s view of the world. Example:UPenn:student(sid, name, advisor) UPenn:Student(sid, name), UPenn:Advisor(sid, fid), UPenn:Faculty(fid, advisor)
Piazza Indexing • Challenge How to send a query to a peer most likely to have the answer and avoid flooding entire network. • Piazza attempts to index schema and value mappings. • Current implementation is centralized • Peers upload summaries of differing granularity of data they possess • Peers periodically refresh their data summaries at the index.
Piazza Indexing • Peers upload attribute value pairs. • Index maintains a table of these pairs together with the object id of its origin. • Users query to the index and are returned the object which contains at least a partial match. • An example of an object that is indexed: s2 = [name = "Por%", age IN [50, 70], disease ="tuberculosis", type = "%"]
Update Management • Data is often replicated with traditional distributed databases • Problem is to avoid reading stale data • Technique – Use Read Consensus and Write Consensus • Example: Write to majority before performing update and/or read to a majority and accept newest version.
Update Management • Quorum Consensus can work with P2P too, but not with 100% guarantee because actual number of replications is not known, so setting a quorum very difficult. • Allow user to set quorum thresholds and accept the consequences of their decisions.
Update Management • Trade-offs
References • Flexible Update Management in Peer-to-Peer Database Systems,David Del Vecchio and Sang H. Son, Department of Computer Science, University of Virginia • An Overview on Peer-to-Peer Information Systems, Karl Aberer, Manfred Hauswirth, Swiss Federal Institute of Technology (EPFL), Switzerland • Data Sharing in the Hyperion Peer Database System, Patricia Rodríguez-Gianolli et al, Proceedings of the 31st VLDB Conference,Trondheim, Norway, 2005 • The Hyperion Project:From Data Integration to Data Coordination, Marcelo Arenas et al, SIGMOD Record, Vol. 32, No. 3, September 2003 • The Piazza Peer Data Management Project, Igor Tatarinov et al, SIGMOD Record, Vol. 32, No. 3, September 2003 • Distributed Query Processing in P2P Systems with incomplete schema information, Marcel Karnstedt, Katja Hose, Kai-Uwe Sattler, Department of Computer Science and Automation, TU Ilmenau P.O. Box 100565, D-98684 Ilmenau, Germany