430 likes | 560 Views
IST-2001-34825. Technique for query answering in the context of one Brokering Agent Domenico Beneventano. Summary. The Mechanical scenario Brokering Agent (BA) Ontology, SINode Ontologies, Data Source Schemata Query Management How to write a query? How to answer a query?
E N D
IST-2001-34825 Technique for query answering in the context of one Brokering Agent Domenico Beneventano
Summary • The Mechanical scenario • Brokering Agent (BA) Ontology, SINode Ontologies, Data Source Schemata • Query Management • How to write a query? • How to answer a query? • Final release of the protoype for Query Management in the context of one Brokering Agent
The Mechanical Scenario Brokering Agent GVV Mapping m1 SINode GVVs Mapping m2 Source Schemata
BA and SINode Ontologies: example of mappings BA GVV Mapping Table of Company (mapping m2) SINode SN2 SINode SN1 Mapping Table of SN2.Company (mapping m1) Source Schemata S1.aziende(ID,INDIRIZZO, ... ) S2.Company(COMPANY_ID, REGION, …) S3.Company(COMPANY_ID, ADDRESS, …) • Source S1 : TUTTOSTAMPI Source S2: DEFORMAZIONE Source S3: SUBFORN
END USER QUERY TOOL The Query Management BROKERING AGENT QUERY AGENT PLAY MAKER EXPANDER BAOntology Give me the subcontracting companies in Veneto with a big capital stock in the plastic and rubber sector Librarian UNFOLDER SINodeAgent2 SEWASIE_DB SINodeAgent1
End-User Query Tool • The query interface is meant to support a user in formulating a precise query – which best captures her/his information needs – even in the case of complete ignorance of the vocabulary of the underlying information system holding the data • The final purpose of the tool is to generate a conjunctive queryready to be executed by the evaluation engine associated to the information system
The role of the Ontology for the End-User • The intelligence of the interface is driven by an ontology describing the domain of the data in the information system • The ontology defines a vocabulary which is richer than the logical schema of the underlying data, and it is meant to be closer to the user’s rich vocabulary • The user can exploit the ontology’s vocabulary to formulate the query, and she/he is guided by such a richer vocabulary in order to understand how to express her/his information needs more precisely
Intentional Navigation • It helps an unskilled user during query formulation, by overcoming problems related with the lack of schema comprehension • Queries can be specified through an iterative refinement process supported by the ontology • Users may specify their requests using generic terms, refine some terms of the query or introduce new terms, and iterate the process • Users explore and discover general information about the domain, by getting an explicit meaning to a query and to its subparts through classification
Query END USER QUERY TOOL The Query Management BROKERING AGENT QUERY AGENT PLAY MAKER EXPANDER BAOntology Give me the subcontracting companies in Veneto with a big capital stock in the plastic and rubber sector Librarian UNFOLDER SINodeAgent2 SEWASIE_DB SINodeAgent1
WP6: The End-User Query Tool summary • Technical challenges • A logic based framework • Reasoning support • Use of web standards • Innovation • A novel query formation paradigm • The role of the ontology • A linear paradigm for easy query formulation • Multi-language support
The Playmaker – reformulation w.r.t. m1 One of the modules of the Brokering Agent (BA): accepts a query and reformulates it according to the semantics of the BA Ontology • The SINode Query Manager – reformulation w.r.t. m2 One of the modules of the SINode Agent: accepts a query and reformulates it according to the semantics of the SINode Ontology, and returns the result to the Query Agent Query Management: the three main components • The Query Agent – coordination of query processing Accepts the query from the End User Query Tool, interacts with both the BA and the SINode Agents, and returns the result to the End User Query Tool Brokering Agent GVV Mapping m1 SINode GVVs Mapping m2 Source Schemata
The playmaker: EXPANDER + UNFOLDER • EXPANDER (by UNIROMA) • Query expansion : The query is expanded by taking into account the constraints in the BA-GVV: all constraints in the ontology are “compiled in” the expansion, so that the expanded query (EXPQuery) can be processed by ignoring constraints – this is the first technique of this kind in the data integration literature, as all other approaches to GAV (Global as View) data integration are based on just unfolding (which is an incomplete technique in our case!) • Subquery identification: Relevant subqueries (EXPAtoms) are extracted from the expanded query. An EXPAtom is a Single Class Query, i.e., a query on a single Global Class of the BA-GVV. • UNFOLDER (by UNIMO) • Query unfolding: Each EXPAtom is unfolded by taking into account the mappings in the BA Ontology, so that it is rewritten w.r.t. the SINode GVVs. The unfolding is performed on the basis of the full disjunction operator, used to perform Object Fusion. The output is a SQL query (FDQuery) which computes the full disjunction;the atoms of FDQuery (FDAtoms) are Single Class Queries over the SINode GVV • Resolution Functions: Resolution Functions, to deal with conflicts among attributes involved in the query, are individuated
Query END USER QUERY TOOL The playmaker: EXPANDER BROKERING AGENT QUERY AGENT PLAY MAKER Query EXPANDER Expanded Query: EXPQuery BAOntology ExpAtoms Librarian UNFOLDER scq1: SELECT CATEGORY_ID FROM Mould_Making scq2: SELECT NAME,COMPANY_ID,CAPITAL_STOCK, REGION,SUBCONTRACTOR,ADDRESS FROM company WHERE CAPITAL_STOCK > 50 AND AND REGION LIKE 'VENETO' AND SUBCONTRACTOR LIKE ’yes’ scq3: ... EXPQuery: SELECT r2.NAME,r2.ADDRESS,r2.NATION FROM scq1 r1,scq2 r2,scq3 r3 WHERE r1.CATEGORY_ID=r3.CATEGORY_ID AND r2.COMPANY_ID=r3.COMPANY_ID UNION SELECT r2.NAME,r2.ADDRESS,r2.NATION FROM scq4 r1,scq2 r2,scq3 r3 WHERE … UNION … SINodeAgent2 SEWASIE_DB SINodeAgent1
END USER QUERY TOOL The playmaker : UNFOLDER scq2: SELECT NAME,COMPANY_ID,CAPITAL_STOCK, REGION,SUBCONTRACTOR,ADDRESS FROM company WHERE CAPITAL_STOCK > 50 AND AND REGION LIKE 'VENETO' AND SUBCONTRACTOR LIKE ’yes’ BROKERING AGENT Full Disjunction: FDQuery: SELECT * FROM FDAtom1 OUTER JOIN FDAtom1 ON (FDAtom1.COMPANY_ID = FDAtom2.COMPANY_ID) QUERY AGENT PLAY MAKER Query Query EXPANDER Expanded Query: EXPQuery BAOntology ExpAtoms Librarian ExpAtoms Unfolding: FDQuery, FDAtoms, ResFunctions UNFOLDER • FDAtom2: • SELECT COMPANY_ID,NAME,REGION, ADDRESS, SUBCONTRACTOR FROM company WHERE ((REGION) like ('VENETO') and (SUBCONTRACTOR) like ('yes')) • FDAtom1: • ... Resolution Function: precedence(${SI-NMAgent2.company.ADDRESS},${SI-NMAgent1.company.ADDRESS}) SewasieRepository
JC(L1,L2) Object Fusion: Object Identification • Object fusion: grouping together information about the same real-object stored in different sources (SINodes). • Merging data from different sources requires different representations of the same real world object to be identified; this process is called object identification • In our system the object identification problem is solved by defining • Join Conditions among classes of the same Global Class. A Join Condition can be a generic expression, defined by using SQL or external functions. • In this prototype a simple equality conditionis implemented. For example: JC(L1,L2) : L1.COMPANY_ID = L2.COMPANY_ID L2 L1 O1 O O2 L1=SN1.Company L2 = SN2.Company
Object Fusion: Full Disjunction • A global class is expressed by means of the full-disjunction of local classes, that has been recognized as providing a natural semantics for data merging queries • Definition of full-disjunction[Rajarama, Ullman - PODS 1996] “Computing the natural outerjoin of many relations in a way that preserves all possible connections among facts” • Given a global class G = { L1, L2, …, Ln }, its instance is the full-disjunction of L1, L2, …, Ln (FDG(L1,L2, …, Ln)) computed on the basis of the Join Conditions L2=SN2.Company L1=SN1.Company FDG(L1,L2) : select S(L1)ÈS(L2) from L1 outer join L2 on JC(L1,L2)
L1 JC(L1,L3) JC(L1,L2) JC(L2,L3) L3 L2 Full Disjunction Computation • Goal : To compute the Full Disjunction by means of an SQL query • [Rajarama, Ullman - PODS 1996] : There is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, acyclic hypergraph. • But, a Global Class with more than 2 local classes is a cyclic hypergraph. • Naive evaluation (actual implementation) – Example n = 3select * from L1 outer join L2 on JC(L1,L2))outer join L3on ( JC(L1,L3) OR JC(L2,L3)) • New proposed method : outerjoin pseudo-sequence – Example n = 3select * from (L1 outer join L2 on JC(L1,L2))outer join (L1 outer join L3 on JC(L1,L3)) on JC(L2,L3) • Implementation of methods proposed in literature
Object Fusion: Resolution Functions • Data coming from different SInodes may be inconsistent • Resolution functions: to solve data conflict on an attribute mapped into more than one SINode (instances of the same object coming from different SINodes have different values for local attributes mapped into the same global attribute) • No data conflict : Homogeneous Attribute • An example : precedence(L1.ADDRESS,L2.ADDRESS) Application of the resolution functions
Query unfolding: Local Queries Computation • An EXPAtom is a Query Q on a Global Class G = { L1, L2, …, Ln } Q = select <Q_select-list> from G where <Q_condition> • A FDAtom is a Local Query Q on a Local LQ_L = select <Q_L_select-list> from L where <Q_L_condition> • Constraint Mapping - <Q_L_condition>: • constraints of <Q_condition> which can be solved in L are rewritten w.r.t. L • Residual Constraints - <Q_residual_condition>: • constraints not included in all local <Q_L_condition> • Local Select List - <Q_L_select-list> : attributes of the • <select-list> of Q + residual constraints + Join Conditions
Constraint mapping for Homogeneous Attributes • An atomic constraint (GA op value) is mapped onto the local class L as: (MTF[GA][L] op value) if MT[GA][L] is not null and the op operator is supported into L trueotherwise • An atomic constraint (GA1 op GA2) is mapped onto the local class L as: (MTF[GA1][L] op MTF[GA2][L]) if MT[GA1][L] and MT[GA2][L] are not null and the op operator is supported into L trueotherwise • The current implementation of the prototype assumes that each operator, OP, used in the global query is supported into a local class, i.e. a constraint including OP can be solved in local class.
Query unfolding example Global Class: Company = { SN1.Company, SN2.Company} scq2: SELECT NAME,COMPANY_ID,CAPITAL_STOCK, REGION,SUBCONTRACTOR,ADDRESS FROM company WHERE CAPITAL_STOCK > 50 AND AND REGION LIKE 'VENETO' AND SUBCONTRACTOR LIKE ’yes’ Global Query FDAtom1 SELECT COMPANY_ID,NAME,REGION,ADDRESS,SUBCONTRACTOR FROM SN1.company WHERE (REGION like 'VENETO' and SUBCONTRACTOR like 'yes') Local queries FDAtom2 SELECT COMPANY_ID,NAME,REGION,ADDRESS,SUBCONTRACTOR FROM SN2.company WHERE ( REGION like 'VENETO' and CAPITAL_STOCK > 50 like 'yes')
END USER QUERY TOOL The Query Agent BROKERING AGENT QUERY AGENT PLAY MAKER Query Query EXPANDER Expanded Query: EXPQuery BAOntology ExpAtoms Librarian ExpAtoms Unfolding: FDQuery, FDAtoms, ResFunctions UNFOLDER SINodeAgent2 SEWASIE_DB SINodeAgent1
END USER QUERY TOOL FDAtoms FDAtoms Answer to FDAtoms Answer to FDAtoms The Query Agent : EXECUTION • EXECUTIONFor each FDAtom (Parallel Execution): • INPUT: FDAtom • MESSAGES: from QA to SINode Agent • OUTPUT: a table storing the FDAtom result in the SEWASIE_DB BROKERING AGENT QUERY AGENT PLAY MAKER EXECUTION EXPANDER BAOntology Librarian UNFOLDER SINodeAgent2 SEWASIE_DB SINodeAgent1
END USER QUERY TOOL The Query Agent : FUSION BROKERING AGENT QUERY AGENT PLAY MAKER EXECUTION • FUSIONFor each EXPATom (Parallel Execution): • INPUT : FDAtoms, FDQuery, Resolution Functions • Execution of FDQuery (Full Disjunction of the FDAtoms) • Application of the Resolution Functions on the result of previous action • OUTPUT: a view storing the EXPAtom result in the SEWASIE_DB EXPANDER FUSION BAOntology Librarian UNFOLDER SINodeAgent2 SEWASIE_DB SINodeAgent1
FUSION: Detailed steps • An EXPAtom is a Query Q on a Global Class G = { L1, L2, …, Ln } Q = select <Q_select-list> from G where <Q_condition> • Local queries • For each local class L, local query over L : Q_L • Full Disjunction of the local query answers • Q_FD = FDG(Q_L1, …, Q_Ln) • Resolution Functions applied to Q_FD • Q_FD_RES • EXPAtom result = select <Q_select-list> • from Q_FD_RES • where <Q_residual-condition>
END USER QUERY TOOL The Query Agent : FINAL RESULT BROKERING AGENT QUERY AGENT PLAY MAKER EXECUTION • FINAL RESULT • INPUT : Output of the FUSION step • Execution of the Expanded Query • OUTPUT : Final Query result view stored in the SEWASIE_DB EXPANDER FUSION BAOntology Librarian UNFOLDER FINAL RESULT SINodeAgent2 SEWASIE_DB SINodeAgent1
Query Management: main theoretical features • Technique for GAV (Global-as-view) data integration system structured in two levels • At each level, the semantics of the schema (BA GVV, and SINode GVV, respectively) taken into account by a novel technique (query expansion). First algorithm of this type proved correct (i.e., sound and complete wrt the semantics) • By virtue of the separation between query expansion and query rewriting and evaluation, query processing is polynomial time in data complexity (i.e., with respect to the size of the data at the sources) • The Object Fusion problem is dealt with a novel technique based on the combination of the full disjunction operation and the resolution functions
IST-2001-34825 Technique for query answering in the context of more than one Brokering Agent Maurizio Lenzerini
Brokering Agent Ontology Brokering Agent Ontology Brokering Agent Ontology Mapping Mapping Mapping SINode Global View SINode Global View SINode Global View Mapping Mapping Mapping Data Sources Data Sources Data Sources Problem: How to answer a query posed to a BA ? Query Mapping Mapping Mapping
Peer-to-peer data integration • Query answering in the context of more than one Brokering Agent can be seen as the problem of answering queries in a peer-to-peer data integration system • Peer Brokering agent • P2P mapping mapping between BAs • Peer data source SIN node • Local mapping mapping between BA and SIN node • One basic problem in P2P data integration is the semantics of P2P mappings
The three main components - see also [Franconi&al ‘04]
Current and future work • Algorithm already implemented • Future work: • Testing • Dealing with inconsistency • Dealing with preferences