240 likes | 400 Views
Unify the Description of Web Services and RDF Data Sources towards Uniform Data Integration. Wenfeng Zhao ( zhaowenfeng@gmail.com ), Xiangwu Meng, Junliang Chen and Chuanchang Liu State Key Lab of Networking and Switching Technology,
E N D
Unify the Description of Web Services andRDF Data Sources towardsUniform Data Integration Wenfeng Zhao (zhaowenfeng@gmail.com), Xiangwu Meng, Junliang Chen and Chuanchang Liu State Key Lab of Networking and Switching Technology, Beijing University of Posts and Telecommunications (BUPT)
Outline • Purpose and Background • Proposed Approach • Implementation
Outline • Purpose and Background • Proposed Approach • Implementation
Requirements for integration • The theme of SOA: Integration • Data Integration • Process Integration • Web Service V.S. Data Integration • Usually, WS is seen as the underlying interoperation facilities of DI systems as well as others. • In fact, some WS themselves are used just to provide data, especially dynamic data such as weather prediction and merchandise quotation to users.
So what if … • this kind of WS* is integrated into the DI system as a data-source such as database? Data Queries * The “Web Service (WS)s” in the following all refer to this kind, i.e. information-providing Web Service.
Traditional DI technologies • Data Warehouse- copy (and transform) data in background; answer queries immediately. //ETL: Extract, Transform, and Load • “Federal system”- commit queries to data sources at runtime, given the local schema or global one is designed as a view over the other one (i.e. LAV or GAV). We will follow this style.
Current WS technologies aren’t adequate To describe the data schema, i.e. the semantic of the WS, • WSDL / BPEL / WS-CDL don’t work. • Semantic Web Service technologies are also inadequate! • OWL-S / WSMO / SAWSDL, considering Input/Output as concepts,aretoo coarse-grained, at least in context of data query. How the service that provides book information given its topic is specified should be annotated? bookQuery{ [Input] topic: string; [Output] title: string; author: string; pub_year: gYear; } like this ? bookQuery{ [Input] topic Book [Output] title ? author Person pub_year? } or like this ? bookQuery{ [Input] topic Book [Output] title author Book pub_year } Given a typical Description Logic Class Book{ topic: string; title: string; author: Person[ ]; pub_year: gYear; isbn: ISBN; … } Both inaccurate!
Neither do current Semantic Web tech. • RDFis the well-accepted Semantic Web data format standard. • SPARQL is the most-promising query language of RDF. • By considering it as a view, SPARQL is nearly capable of expressing the semantic of an information-providing Web Service. A data query/view (in SQL) SELECT title, author, publisher, pub_year FROM book WHERE topic= ‘Semantic Web’ and pub_year>2005 and topic… • But, like SQL, SPARQL can’t describe the meaning of Input of Web Services! • In the previous example WS, the input “topic” isn’t bound in advance, but required to be bound to a certain value while invocation. • How do SPARQL/SQL describe this type of “view”?
Outline • Purpose and Background • Proposed Approach • Implementation
bk.publishYear>1900 … … Proposed Approach – Uniform Query • (from WS viewpoint) Annotate Input/Output at the property level. • 1. Introduce some local individual of certain classes, which are connected by Object-Properties; • 2. Annotate each I/O leaf element of each message part with a Datatype-Property of an individual. On the previous example: bookQuery{ [Input] topic bk.topic [Output] title bk.title author ath.hasName pub_year bk.publishYear amazonRankbk.amazonRank } A Uniform Query
<UniformQuery> -<Concepts> <ConcepthasURI="http://examples.org/travel#Book" /> <ConcepthasURI="http://examples.org/travel#Person" /> </Concepts> -<Individuals> <Individualname="_:bk" typeIndex="0" /> <Individualname="_:ath" typeIndex="1" /> </Individuals> -<Connections> <Linkfrom="0" label="http://examples.org/travel#author" to="1" /> </Connections> -<SelectionConditions> -<Selector indvIndex="0" property="http://examples.org/travel#publishYear"> -<ValueScope> <LowBound isInclusive="false">1900</LowBound> </ValueScope> </Selector> </SelectionConditions> -<Outputs> <FieldindvIndex="0" property="http://examples.org/travel#amazonRank" /> <FieldindvIndex="0" property="http://examples.org/travel#publishYear" /> <FieldindvIndex="0" property="http://examples.org/travel#title" /> <FieldindvIndex="1" property="http://examples.org/travel#hasName" /> </Outputs> -<Inputs> <FieldindvIndex="0" property="http://examples.org/travel#hasTopic" /> </Inputs> </UniformQuery>
Grounding to WS – e.g. Amazon ECS <ServiceStubAnnotation stubName="com.amazon.webservices.AWSECommerceService._2005_10_13.AWSECommerceServicePortType"> -<Operation name="itemSearch"> -<Binding uniformQueryID="2"> -<InputMapping> <Path>body/request[0]/keywords</Path> </InputMapping> -<OutputMapping> <Path>_return/items[0]/item[0]/salesRank</Path> <Path>_return/items[0]/item[0]/itemAttributes/publicationDate</Path> <Path>_return/items[0]/item[0]/itemAttributes/title</Path> <Path>_return/items[0]/item[0]/itemAttributes/author[..]</Path> </OutputMapping> -<FixedInputs> <ValueGivenpath="body/request[0]/searchIndex" value="Books" /> <ValueGivenpath="body/request[0]/responseGroup[0]" value="Medium" /> <ValueGivenpath="body/subscriptionId" value="1CXXQY……" /> </FixedInputs> </Binding> </Operation> </ServiceStubAnnotation> partial annotation! (as SAWSDL)
Uniform Query (cont.) • From SW viewpoint, what we have done in Uniform Query is augmenting SPARQL with the concept “access pattern limitation [Hal01]” for a view to describe the correspondent of WS “Input”. • i.e. which variables/fields/properties, if any, must be bound when to query on a view. • So, SPARQL query and WS have a unified form. • With proper matching process between Uniform Queries, a WS can now be chosen to answer the SPARQL-like query. • Although there is no “Inputs” in the query, they can be derived from the “SelectionConditions” - the fields that are limited to a single value.
Matching Criteria The matching criteria of a Web Service type data source WS against a data query Q is that there exists an injective mapping from individuals of Q.Qps to those of WS.Qps under which that: 1) Q.Qps is entailed by WS.Qps (which means as directed graphs Q.Qps is sub-graph of WS.Qps); 2) The value spaces of Q.Sel and WS.Sel have a non-empty intersection. 3) Q.O should be (partially) entailed by WS.O, and 4) WS.I should be completely entailed by Q.I. The result of the invocation of matched services might need to be filtered before returned to the user according to the difference of DS.Sel and Q.Sel.
bk.publishYear>2005 … … Uniform Query SPARQL query • The transformation is straightforward. For example, a data query, in UQ, similar with and capable of being answered by previous WS is transformed into SPARQL as: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?o0 ?o1 ?o2 ?o3 WHERE { ?v0 rdf:type <http://examples.org/travel#Book> . ?v1 rdf:type <http://examples.org/travel#Person> . ?v0 <http://examples.org/travel#author> ?v1 . OPTIONAL { ?v0 <http://examples.org/travel#amazonRank> ?o0 } OPTIONAL { ?v0 <http://examples.org/travel#publishYear> ?o1 } OPTIONAL { ?v0 <http://examples.org/travel#title> ?o2 } OPTIONAL { ?v1 <http://examples.org/travel#hasName> ?o3 } ?v0 <http://examples.org/travel#hasTopic> ?l1 . FILTER ( ?l1="Semantic Web" && ?o1 > "2005" ) }
Outline • Purpose and Background • Proposed Approach • Implementation
Proof-of-concept: USDIS*(Uniform Semantic Data Integration System) Uniform Query * Source codes is available at http://usdis.googlecode.com
Mechanism of Dynamic WS invocation 1) A set of <namePath, value> pair for Input, and 2) A set of <namePath> for Output. Based on Java reflection mechanism.
Note • In current version of USDIS, the expressivity of Uniform Query issimpler than SPARQL, especially in the selection condition. • The major limit is that the selection condition only supports thetree-like one, i.e. the attributes values could only be limited respectively rather than interactively. • This simplification make the GUI of query formulating easy to design and use.
Data from multiple data sources are converged! Data from multiple data sources are aggregated! Query Result – Integration Realized!
Reference • [Hal01] Halevy, A.Y.; Answering queries using views: A survey. VLDB Journal. 10(4) , 2001, pp.270-294
Q&A Thanks! USDIS source-code: http://usdis.googlecode.com Wenfeng ZHAO (赵文峰) PhD Candidate zhaowenfeng@gmail.com