310 likes | 476 Views
Ontology-based Integration of XML Web resources. Irini Fundulaki CNAM-Paris, INRIA-Futurs (France) Bernd Amann, Michel Scholl CNAM-Paris, INRIA-Futurs (France) Catriel Beeri The Hebrew University, Jerusalem. The World according to XML.
E N D
Ontology-based Integration of XML Web resources Irini Fundulaki CNAM-Paris, INRIA-Futurs (France) Bernd Amann, Michel Scholl CNAM-Paris, INRIA-Futurs (France) Catriel Beeri The Hebrew University, Jerusalem
The World according to XML • XML isthe standard for the representation and exchange of Web data • Success of XML : “Semantic” Tags Structured Querying • But : • “Semantic” tags are not always appropriate • Semantics is hidden in the document structure • XML DTDs can be very complex • Solution : Ontologies Semantic Querying
Outline • Problems for querying XML sources • The STYX approach for querying and integrating XML Web sources • The Ontology • Publishing XML sources • Answering Queries • Semantic Keys • Conclusions and Contributions
The XML World : A simple example <!ELEMENT Film (Crew)> <!ATTLIST Film Title #CDATA #REQUIRED> <!ELEMENT Crew (Member*)> <!ELEMENT Member EMPTY> <!ATTLIST Member Name #CDATA > Film Crew Title Member ‘Intervention Divine’ Name Name Name ‘Suleiman’ ‘Yitzak’ ‘Khader’
XML World : What about Semantics and Querying ? • What about querying ? • Be aware of the XML query language supported by the source • Be aware of the structure and the semantics • Where are the semantics ? • Some in the DTD : Element names and parent/child relationships • a Film element “contains” a Title and a Crew elements • Some in the XML document structure : • the first Crew element represents the film’s director • the second Crew element represents the film’s assistant director Ask God for the semantics ! (I.e. source administrator)
Querying the XML World • Simple Query :«The director and assistant director of the film ‘Intervention Divine’» • Simple (?) XQuery expression : FOR $a IN document(‘URL’/Film), $b IN $a/Crew/Member[1] $c IN $b/following-sibling::*[1] WHERE $a/@Title = ‘Intervention Divine’ RETURN $b/@Name , $c/@Name
From the XML World to the Semantic Web • XML only does not answer the needs of the Semantic Web • Need for richer models that precise/clarify the semantics of XML data : rich domain schemas (e.g. ontologies) • Applications of the Semantic Web: • Querying and • Data Integration
The STYX approach for integrating and querying XML Web resources • IntegratingXML resources: • Integration schema (Ontology):conceptual schema with semantic keys, symmetric relationships and inheritance • XML resources are described by mapping rules between paths in the XML tree (XPath location paths) and ontologypaths • Query Mediation: • User queries are defined in terms of the ontology • Query rewriting using mapping rules • Query evaluation over multiple sources • Joining the results using semantic keys
A ‘Simple’ World Assumption • Domain of interest contains: • Entities, semantic relationships between entities and properties of entities • The STYX Ontology models the domain of interest and is comprised of : • Concepts • symmetric binary roles between concepts • attributes of concepts and • inheritance relations to model commonality of structures and subset relationships between concepts
took place at (place of) actor (played in) assisted by PLACE (assisted) filmed directed by EVENT FILM PERSON (directed) (filming of) String Integer has title String has name took place in POLITICAL FILM Example of a (simple) STYX Ontology Concepts Inheritance Relations Roles Semantics ? No Need to ask God! Inverse Roles Attributes
SELECT e,f FROM FILM a, a.has title b, b.directed_by c, c.assisted by d, c.has_name e, d.has_name f WHERE b = ‘Intervention Divine’ Return the requested values Get the film Get its title Get the director Get the assistant director Get their names Check the title Querying in STYX • Simple Query :«The director and assistant director of the film ‘Intervention Divine’»
Publishing XML sources in STYX took place at (place of) actor (played in) assisted by PLACE (assisted) filmed directed by EVENT FILM PERSON (directed) (filming of) has title String Integer String has name took place in POLITICAL FILM R1 : URL/Film as u1 POLITICAL FILM Film Crew Title Member Name
Publishing XML sources in STYX took place at (place of) actor (played in) assisted by PLACE (assisted) filmed directed by EVENT FILM PERSON (directed) (filming of) has title String Integer String has name took place in POLITICAL FILM R1 : URL/Film as u1 POLITICAL FILM Film R2 : u1/@Title as u2 has title Crew Title Member Name
Publishing XML sources in STYX took place at (place of) actor (played in) assisted by PLACE (assisted) filmed directed by EVENT FILM PERSON (directed) (filming of) has title String Integer String has name took place in POLITICAL FILM R1 : URL/Film as u1 POLITICAL FILM Film R2 : u1/@Title as u2 has title R3 : u1/Crew/Member[1] as u3 directed by Crew Title Member Name
Publishing XML sources in STYX took place at (place of) actor (played in) assisted by PLACE (assisted) filmed directed by EVENT FILM PERSON (directed) (filming of) has title String Integer String has name took place in POLITICAL FILM R1 : URL/Film as u1 POLITICAL FILM Film R2 : u1/@Title as u2 has title R3 : u1/Crew/Member[1] as u3 directed by Crew Title R4 : u3/following-sibling::*[1] as u4 assisted by Member Name
Publishing XML sources in STYX took place at (place of) actor (played in) assisted by PLACE (assisted) filmed directed by EVENT FILM PERSON (directed) (filming of) has title String Integer String has name took place in POLITICAL FILM R1 : URL/Film as u1 POLITICAL FILM Film R2 : u1/@Title as u2 has title R3 : u1/Crew/Member[1] as u3 directed by Crew Title R4 : u3/following-sibling::*[1] as u4 assisted by Member R5: u3/@Name as u5 has name R6: u4/@Name as u6 has name Name
Querying in STYX • Queries are simple tree queries expressed in terms of the STYX ontology • No joins, restructuring, aggregation • Query Evaluation over multiple sources • A source, returns only a subset of the possible answers for the query • To get additional answers, we must evaluate the query over all published sources • The partial results are finally processed by the mediator
Querying one source in STYX • To evaluate a query over a source: • find the mapping rules that give answers to the query variables binding variables to rules • rewrite the query into an XML query expressed in the schema of the XML source • the XML query is evaluated by the source • and the answers are returned to the STYX mediator
Query Rewriting in STYX «The director and assistant director of the film ‘Intervention Divine’» R1 : URL/Film as u1 POLITICAL FILM FILM a R2 : u1/@Title as u2has title has title directed by R3 : u1/Crew/Member[1] as u3directed by b c assisted by has name R4 : u3/following-sibling::*[1] as u4assisted by R5 : u3/@Name as u5has name d e has name R6 : u4/@Name as u6has name f Variable to Rule Bindings [a R1]
Query Rewriting in STYX «The director and assistant director of the film ‘Intervention Divine’» R1 : URL/Film as u1 POLITICAL FILM FILM a R2 : u1/@Title as u2has title has title directed by R3 : u1/Crew/Member[1] as u3directed by b c assisted by has name R4 : u3/following-sibling::*[1] as u4assisted by R5 : u3/@Name as u5has name d e has name R6 : u4/@Name as u6has name f Variable to Rule Bindings [a R1] [a R1, b R2]
Query Rewriting in STYX «The director and assistant director of the film ‘Intervention Divine’» R1 : URL/Film as u1 POLITICAL FILM FILM a R2 : u1/@Title as u2has title has title directed by R3 : u1/Crew/Member[1] as u3directed by b c assisted by has name R4 : u3/following-sibling::*[1] as u4assisted by R5 : u3/@Name as u5has name d e has name R6 : u4/@Name as u6has name f Variable to Rule Bindings [a R1, b R2] [a R1, b R2, c R3]
Query Rewriting in STYX «The director and assistant director of the film ‘Intervention Divine’» R1 : URL/Film as u1 POLITICAL FILM FILM a R2 : u1/@Title as u2has title has title directed by R3 : u1/Crew/Member[1] as u3directed by b c assisted by has name R4 : u3/following-sibling::*[1] as u4assisted by R5 : u3/@Name as u5has name d e has name R6 : u4/@Name as u6has name f Variable to Rule Bindings Full Binding [a R1, b R2, c R3, d R4, e R5, f R6]
FILM a R1 (URL/Film ) b R2( a/@Title) a has title directed by c R3( a/Crew/Member[1]) b c d R4( c/following-sibling::*[1] has name assisted by e R5( c/@Name) d e f R6( d/@Name) has name URL/Film f a @Title Crew/Member[1] b c @Name following-sibling::*[1] d e @Name f Rewriting to XQuery expression FOR $a document(‘URL’/Film), $b IN $a/@Title, $c IN $a/Crew/Member[1] $d IN $c/following-sibling::*[1], $e IN $c/@Name, $f IN $d/@Name WHERE $b = ‘Intervention Divine’ RETURN $e, $f
What about queries that cannot be answered by a source ? «The director and assistant director of the film ‘Intervention Divine’ and its year of creation ?» FILM a has title filmed.took place in directed by b g c assisted by has name d e has name f Variable to Rule Bindings Partial Binding [a R1, b R2, c R3, d R4, e R5, f R6]
Partial Bindings • To get a full answer, we need to evaluate the sub-query that the source cannot answer to the other sources and then join the partial results • To obtain this (those) sub-query (queries) we need to decompose the query into : • a prefix query that the source answers • and one or more suffix queries (sub-queries) that are possibly answered by the other sources • To join, we need keys!
Semantic Keys in STYX : Ontology Revisited • XML keys • Local ID/IDREF attributes (internal pointers) • XML Schema keys are defined in terms of local element/attribute values • No formal agreement ! • Solution : define keys at the ontology level !
Semantic Keys in STYX : Ontology Revisited • Semantic Keys defined in concepts of the ontology independently of any possible keys defined at the XML sources • A key for a concept is a set of attribute paths • Example : a film is identified by its title • Instances of concepts are identified by the values of the keys obtained by the mapping rules
FILM a filmed.took place in has title directed by b g c assisted by has name d e FILM has name a f has title directed by b c has name assisted by d e has name PREFIX QUERY f Decomposing the query Variables to Rules Binding : [a R1, b R2, c R3, d R4, e R5, f R6]
FILM a filmed.took place in has title directed by b g c assisted by has name d e FILM FILM has name a a f has title directed by filmed.took place in b c has name assisted by g SUFFIX QUERY d e has name PREFIX QUERY f Decomposing the query Variables to Rules Binding : [a R1, b R2, c R3, d R4, e R5, f R6]
has title has title t t After Decomposition : Add Keys FILM FILM a a directed by has title filmed.took place in b c has name assisted by g SUFFIX QUERY PREFIX QUERY d e has name f The join between the prefix and the suffix queries is the join between values of variable t
Conclusions and Contributions • Adding semantics to XML • Ontology = rich description of the domain of interest • Simple but powerful mapping language that associates XPathlocation paths to ontology paths • Semantic keys for XML data integration • Integration System for XML : STYX prototype • Implementation of the query rewriting and query decomposition algorithms • Web application