280 likes | 432 Views
Navigational Plans For Data Integration. Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala. Introduction. .Data Integration with webs of data as sources.
E N D
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala
Introduction .Data Integration with webs of data as sources. .Previous works are inappropriate for incorporating data webs as sources in Data Integration. .Data Integration systems posses many hard technical problems. .Due to growing number of sources ,they should be modeled as webs of data.
GOAL • A Procedure for modeling data webs i.e incorporating them into a Data Integration system. • GLAV language for source description. • An algorithm for reformulating user queries into executional plans that both query and navigate the data sources.
Incorporating Data Webs • A Data web consists of pages and links between them. • The structure of a Data Web is represented with a Web Schema. • In a Web Schema Nodes Sets of pages Directed Edges Sets of directed links between them
Univ represent the home page of the university. • Univ(u1) denotes the home page object of university u1. • Every websites has a set of entry points i.e. nodes. • The Data Integration System can access directly by URL using entry points.
There are three kinds of logical information stored on each page:- 1) Ordinary contents of the page. p(Y1,Y2……Yk) 2) Outgoing edges from the page. P(x,y) --> M(Y) 3) Search forms on the page. p(x,y )-----> M(Y). • Search forms map binary relations to other pages. form
Mediated Schemas • It is a set of relations which serves as uniform query interface for all sources. • Here is the example of mediated schema for our university Domain collegeOf(College,University) depfOf(Department,College) profOf(Proffesor,Department) courseOf(Course,Department) chairOf(Proffesor,Department) prereqOf(Course,Course)
The user posses queries in terms of relations and attributes of a mediated database schema. • The relations in the mediated schema are virtual. • The mediated schema captures the aspects of the domain of interest to the users of the application.
Source Descriptions • Why Source Descriptions? • Sample Source Description
The mediated schema relations do not match the source relations in one-one fashion because 1) Source schema contains different levels of detail from each other. 2) Splitting of attributes into relations is different. • In addition to mediated schema ,the system has a set of source descriptions that specify a semantic mapping between the mediated schema and the source schema. • The problem of mismatch can be solved by GAV and LAV source description languages.
The LAV source description have the form v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk) where v---Source Relation ri’s---mediated schema relations LAV contains details that are not presented in every source. _ _ _ _ _
GAV source description have the form • _ _ _ _ _ V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X) • There are undesirable consequences of using the either one. • There is also no flexibility. • GLAV combines the expressive power of both GAV and LAV.
The GLAV source description has the form _ _ _ _ _ _ V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk). • It allows source descriptions that contain recursive queries over sources.
Data Integration Domain • The combination of set of source descriptions and set of web schemas form Data integration Domain. • It can be denoted as D= triple(R,{Gi},SD) where R--> Set of mediated schema relations Gi--> Web Schemas SD--> Source Descriptions.
How to answer a Query? • Using a query processor. • The user query is translated into a lower level procedural program called an executional plan. • A logical plan is constructed first . • A navigational plan is formed later by augmenting logical plan with navigational information • A Navigational plan describes how to locate the desired relations in the data webs.
Logical Plan • A Logical Plan is a Datlog Program whose EDB relations are the source relations and whose answer predicate is q. • The result of applying a Datlog program to a data base is the set of tuples computed for a query predicate. • If a conjunctive query Q is given , a sound and complete logical plan is constructed for a query using an inverse rules algorithm for GLAV called as GlavInverse. • Let ‘T’ contains the sentences in the source description, then the GlavInverse converts the theory T into a Datlog program.
Theorem: Let D=(R,{Gi},SD) be an information integration domain. Let ‘Q’ be a conjunctive query. Then the logical plan ‘▲’ returned by GlavInverse is sound and complete.
Navigational Plan • Logical plans do not explain how to populate the source relations from data webs. So they cannot be executed by themselves. • Logical plans are extended to navigational plans. • Navigational plans are augmented datlog programs. • Navigational terms specify both the location and the logical content of the relation stored in the data web.
The navigational term is of the form P:v(x), where P is the path and v is the source relation. • The path ‘P’ starts at source(P) and ends at target(P) . • Trivial paths: If P=[N(X)] Where N---node , X—variable or constant. Source(P) = target(P) = N(X).
Compound paths: P = [P--M(Y)] is a path If P is a path with target(P) = N(X) e is an edge from node N(X) to node M(Y) then, source(P`) = source(P) and target(P`) = M(Y). e
Algorithm of Navigational plan produces a Navigational plan ∆′ if logical plan ∆ and web schemas. • The Navigational plan ∆′ produced by Navigational plan is sound and complete.
Conclusions • How to extend Data Integration systems to incorporate data webs is shown. • A formalism for modeling data webs and a language for source descriptions is studied. • An algorithm for answering queries using GLAV source description is focused.