Implementing Database Coordination in P2P Networks *

Implementing Database Coordination in P2P Networks * Ilya Zaihrayeu SemPGRID-04, 18 May 2004, New York, USA * work with Fausto Giunchiglia

Why P2P Databases • P2P data sharing: files … relational data? • File sharing: KaZaa + Morpheus = more than 460 million downloads (download.com, May 2004) • P2P databases: academia testbeds so far.. • Promises: large-scale fault-tolerant multi-database system with low start-up and maintenance costs, and high “output” for an individual party • Difficulties: data integration solutions are not applicable due to centralized nature • Challenges: new methodologies, theories and algorithms, models, mechanisms and tools need to be developed

Why P2P Databases, cont’d • Application: non performance critical domains, where local autonomy of each party is essential • Medical care scenario • John is going for skiing and suffers an accident • John is taken to local clinic for treatment – doctors need to know whether John has contraindication against some drugs • John does not know these details, but his database layer has a link to family doctor’s databases • Cooperating real estate agents example • Agents coordinate their data to push sales • When on the site of a customer who wants to sell, agent updates his database and makes data available for other agents • When on the site of a customer who may want to buy, agent shows details from his database, and may query other agent’s databases • Other examples: scientific databases (genomic data), tourism, etc

Data Coordination Model • Interest Groups – group of peers able to answer queries about a certain topic • e.g., group topic – “Tourism in Trentino”, “Real Estate in Scotland”, etc • each Interest Group has group manager (GM) which helps in maintenance of the group • Acquaintances – “known” nodes that contribute data • acquaintance query – a query over the relations of an acquaintance which results satisfy some local relation • Correspondence Rules – solve heterogeneity problem at instance level • semantic heterogeneity at structure level is solved by acquaintance queries • Coordination Rules – coordinate data (queries and updates) with acquaintances

Interest Groups • Help to cope with large number of nodes by clustering the network • Nodes self-organize into interest groups • A node may form a child interest group • One node may belong to multiple groups • Use schema matching to monitor group constitution • GM is to support group constitution, “talk” to other GMs and provide information about the group to newcomers All topics … Arts Shopping … … Publications Computers Movies Music Books Lyrics

C :- E,F 3 A loop 1 2 D :- I,G F :- G B :- C,D 4 F A B C G D I E I :- A,B Acquaintance query • Acquaintance query is a conjunctive query: • q(X) :- r1(X1), …, rn(Xn) • q(X) – head, refers to local relation; • r1(X1), …, rn(Xn) – subgols of the body, refers to the relation of an acquaintance; and comparison predicates • X, X1,…, Xn – variables or constants; • E.g., P1: films (title, year, genre) :- P2: movie (title, year, director); genres (title, genre); year>1995

Correspondence Rules and Coordination Rules • Correspondence rules define how constants from the local domain are translated into constants in the domain of an acquaintance (forward translation) and vice versa (backward translation) • not necessarily symmetric, e.g. currency translation • Coordination Rules’ goal is data coordination with acquaintances and acquainted nodes • activated by user (user query) or from the network (network query, results, update)

Algorithmic notes • Query answering algorithm • Use acquaintance queries and correspondence rules to translate queries and data • Propagate to acquaintances if acquaintance queries are relevant • Compute only new tuples, reconcile results • Process loops in query propagation, define termination point (no propagation using acquaintance queries that have been already used) • “Getting acquainted” protocol • Retrieve database schemas and then apply a matching operator on them • Based on the matching results, generate (with help of user) acquaintance queries, correspondence rules, tune up coordination rules • Updates handling (work with E. Franconi, G. Kuper, A. Lopatenko) • Data may go through a loop more than once, define termination point

Implementing P2P databases on top of JXTA • Benefits • system platform, networking protocol independence • IP-independence (location independence) • gives basic blocks for building P2P applications • We implement Interest Groups and Acquaintances in JXTA • We encode database related functionalities into a set of custom JXTA services (DB-related services) DB-related services Node-level services Group-level services … … Queries handler DB operations Screening service GM service

Architecture User A node Nodes on the network PDBMS A P2P database network User Interface (UI) User-1 Database Manager (DBM) User-2 JXTA Layer Wrapper User-n SS Source Database (SDB)

User Interface (UI) DBM Updates Handler P2P Management Coordination Rules Acquaintance queries Query Planner Acquaintances Query Propagation Correspondence Rules Results Handler JXTA Layer Peer Groups In Advertisements Out Services Pipes Gr. topic GM in-pipe adv SS DB-related services Disco-very Pipe Adv Peer Gr. Adv Peer Adv JXTA Core Services Wrapper Architecture, cont’d

Demo: toy databases and topology Relations: • Movie (title, year, genre) • Credits (name, title, role) • Movie2 (title, year, director) • Genre (title, genre) Rendezvous peer [1,2] 1 (2:-2) [2] 3 (2:-2) (1:-1) (1:-3,4) Q (2:-2) (4:-1) 0 2 (3:-3) [3] [1,2] 4 [2,3,4] (4:-4) Mediator peer 5 [4]

Query example 1 “List titles of movies featuring Tom Hanks” Q(t) :- Credits (n,t,r); n=“Tom Hanks” [1,2] 1 (2:-2) [2] 3 (2:-2) (1:-1) (1:-3,4) Q (2:-2) (4:-1) 0 2 (3:-3) [3] [1,2] (2:-2) 4 [2,3,4] (4:-4) 5 [4]

Query example 2 “Titles of drama movies issued after 1995” Q(t) :- Movie (t,y,g); g=“Drama”; y>1995; [1,2] 1 (2:-2) [2] 3 (2:-2) (1:-1) (1:-3,4) Q (2:-2) (4:-1) 0 2 (3:-3) [3] [1,2] 4 [2,3,4] (4:-4) 5 [4]

Query example 3 “Names of actors playing in action movies in 2003” Q(n) :- Movie (t,y,g); Credits (n,t,r); r=“Actor”; g=“Action”; y=2003; [1,2] 1 (2:-2) [2] 3 (2:-2) (1:-1) (1:-3,4) Q (2:-2) (4:-1) 0 2 (3:-3) [3] [1,2] 4 [2,3,4] (4:-4) 5 [4]

References • F. Giunchiglia and I. Zaihrayeu. Making peer databases interact - a vision for an architecture supporting data coordination. 6th International Workshop on Cooperative Information Agents (CIA-2002), Madrid, Spain, September 18 -20, 2002. • P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu, “Data management for peer-to-peer computing: A vision,” WebDB, 2002. • A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov, “Schema mediation in a peer data management system,” ICDE, 2003. • V. Kantere, I. Kiringa, J. Mylopoulos, A. Kementsietsidis, and M. Arenas, “Coordinating peer databases using ECA rules,” DBISP2P, September 2003. • Enrico Franconi, Gabriel Kuper, Andrei Lopatenko, Ilya Zaihrayeu (2004). The coDB Robust Peer-to-Peer Database System. Proc. of the 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing (SemPGrid'04), 2004 • JXTA project, see http://www.jxta.org

Announcement Submission deadline: 30 June, 2004 www.p2pkm.org

Thank you

Implementing Database Coordination in P2P Networks *