740 likes | 969 Views
Applying Semantic Web Technologies to Web Services. Vadim Eisenberg December 2008 Seminar in Databases (236826). Outline. Introduction Semantic Web Web Services Applying Semantic Web technologies to Web Services
E N D
Applying Semantic Web Technologies to Web Services Vadim Eisenberg December 2008 Seminar in Databases (236826)
Outline • Introduction • Semantic Web • Web Services • Applying Semantic Web technologies to Web Services • “Composing Web Services on the Semantic Web”, B. Medjahed et. al., VLDB Journal 2003 • “Software Framework for Matchmaking Based on Semantic Web Technology”, L.Li et. al., International Journal of Electronic Commerce 2004
Semantic Web • Today, the data on the Web is “understandable”only by humans • or custom-developed software applications • The vision behind the Semantic Web is that the data on the Web should be processed and “understood” by computers • “... the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.” Tim Berners-Lee, 1999
Example - Analyzing Financial Data on the Web • The need: a software application, which will analyze data about financial products from different investment companies • For example, data about ETFs • ETF (Exchange-Traded Fund) is a low-cost investment scheme , which holds assets as stocks or bonds. Usually tracks indices.
Example: the Setting • Several Israeli investment companies provide ETF products • The differences • the tracked index/indices • the management fee • the dividend distribution policy • the base currency (NIS or other) • the ranking of the product
Example: the Goal • Our application must find the most suitable ETF • more ambitious goal would be enabling this application to buy ETFs • such software applications, which act on behalf of the user are called software agents
The data: the Problems • The data is poorly structured • html • binary formats - pdf • The content and the layout are mixed • There is no commonstructure for different the investment companies • Differentterminology for the same entities
The Problems • Writing an ETF analyzing application is a hard and tedious task • Customparsers for every page must be written • The parsers will be broken once the pages change • The need for developing customrules and taxonomy regarding the ETFs • modeling information about stocks, bonds, currencies, states etc. • customreasoning rules/query engines must be developed for the application
A Step to a More Computer-Friendly Web: XML • A universal format for structured data - XML • easy for parsing • strict rules • a tree representation • easy for automatic validation • XML schema, DTD • the tools automatically validate the data • easy for querying • several query languages exist - XPath, XQuery
A Step to a More Computer-Friendly Web: XML • easy for transformations to other formats • XSLT • plenty of freetools/software packages/APIs for all the above exist • APIs for Programming Languages : JAXP of Java, XLINQ of C# • easy application development
XML Examples <ETF> <Title> Kesem Nikkei </Title> <BaseIndex> Nikkei 225 </BaseIndex> <BaseCurrency> NIS </BaseCurrency> <SecurityNumber> 1099464</SecurityNumber> <DividendDistributionPolicy> coefficient T = 2.2% added to the value of the ETF </ DividendDistributionPolicy > <AnualManagementFee> 0% </AnualManagementFee> </ETF> <ETF SecurityNumber=“1095736” > <Title> Tachlit Nikkei </Title> <TrackedIndex> Nikkei 225 </BaseIndex> <BaseCurrency> NIS </BaseCurrency> <DividendPolicy units=“%” > 100 </DividendPolicy> <ManagementFee period=“year” units=“%” > 0.65 </ManagementFee> </ETF>
XML + XSLT: Separation of Content from Layout • XML documents can store an information about how to display them - references to XSLstylesheets • this enables separation of data proper from the way to display data • the XML data can be processed conveniently machines • and by humans • the XML data can be displayed to humans in a convenient way using XSL stylesheets
Example of XSLT: the Stylesheet <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="ETF"> <html> <head><title><xsl:value-of select="Title"/></title></head> <body> <TABLE> <TBODY> <TR> <TD>Security Name: <SPAN STYLE="color:blue"><b><xsl:value-of select="Title"/></b></SPAN></TD></TR> <TR> <TD>Based on Index: <SPAN STYLE="color:red"><b><xsl:value-of select="BaseIndex"/></b></SPAN></TD></TR> <TR> <TD>Security Number: <SPAN STYLE="color:blue"><b><xsl:value-of select="SecurityNumber"/></b></SPAN></TD></TR> <TR> <TD>Dividend Distribution Policy: <SPAN STYLE="color:blue"><i><xsl:value-of select="DividendDistributionPolicy"/></i></SPAN></TD></TR> <TR> <TD>Anual Management Fee: <SPAN STYLE="color:blue"><b><i><xsl:value-of select="AnualManagementFee"/></i></b></SPAN></TD></TR> </TBODY></TABLE> </body> </html> </xsl:template> </xsl:stylesheet>
XML + XSL Web Browser with XSLT support WebSite XML XSL stylesheet
XML Representation - the Problems • Lack of commonunique terms • Lack of commonstructure (e.g. attributes vs. elements) • Lack of semantics • it is not clear from the document that ETF is a kind of security, as trust funds, stocks and bonds
The Next Step to a More Computer-Friendly Web: RDF • Adding semantics to documents: • RDF - ResourceDescription Framework • Resource is anything that has URI • URI - Uniform Resource Identifier • Unique terms - using URIs. • For example: not necessarily related to Web pages ! http://www.isa.gov.il/ETF#BaseCurrency http://www.isa.gov.il/ETF#ManagementFee http://www.isa.gov.il/Securities#SecurityNumber
RDF • RDF Data Model is based on triples :(subject, predicate, object) • subjects and predicates must be URIs • object can also be URI
RDF Examples ( http://www.xnes.co.il/ETF#KesemNikkei, http://ww.isa.gov.il/Securities#SecurityNumber, 1099464) ( http://www.xnes.co.il/ETF#KesemNikkei, http://www.isa.gov.il/ETF#ManagementFee, 0) ( http://www.xnes.co.il/ETF#KesemNikkei, http://www.isa.gov.il/ETF#BaseCurrency, http://www.currencies.com/NIS)
RDF Formats • RDF could be represented in XML format • XML is not the only format !
RDF/XML Example < rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#“ xmlns=“http://www.xnes.co.il/ETF” xmlns:isa=“http://www.isa.gov.il/ETF” xmlns:cur=“http://www.currencies.com”> <rdf:Description rdf:about=“KesemNikkei”> <isa:SecurityNumber> 1099464</isa:SecurityNumber> <isa:ManagementFee> 0 </isa:ManagementFee> <isa:BaseCurrency rdf:resourse=“&cur;NIS” /> </rdf:Description>
Integrating RDF with the Current Web • Embeddingmetadata in HTML • EmbeddingRDF directly (RDFa) • Embeddingmetadata (Microformats) which could be transformed to RDF (GRDDL) • PublishingRDF + XSL
Publishing HTML with RDF embedded Web Browser with XSLT support WebSite HTML
Embedding RDF in HTML <html> <head><title>Kesem Nikkei</title></head> <body> <div xmlns:isa="http://www.isa.gov.il/ETF" xmlns:cur="http://www.currencies.com" about="http://www.xnes.co.il/ETF/KesemNikkei"> <TABLE> <TBODY> <TR> <TD>Security Name: <SPAN STYLE="color:blue"><b>Kesem Nikkei</b></SPAN></TD></TR> <TR> <TD>Based on Index: <SPAN STYLE="color:red"><b>Nikkei 225</b></SPAN></TD></TR> <TR> <TD>Security Number: <SPAN STYLE="color:blue" ><b> <SPAN property="isa:SecurityNumber">1099464</SPAN></b></SPAN></TD></TR> <TR> <TD>Anual Management Fee: <SPAN STYLE="color:blue"><b><i> <SPAN property="isa:ManagementFee">0% </SPAN></i></b></SPAN></TD></TR> <TR> <TD>Based On Currency<SPAN STYLE="color:blue"><b><i> <SPAN property="isa:Currency"> NIS </SPAN></i></b></SPAN></TD></TR> </TBODY></TABLE> </div> </body> </html>
Publishing RDF + XSL Web Browser with XSLT support WebSite RDF/XML XSL stylesheet
RDF: a Graph Representation isa:BaseCurrency cur:state geo:continent geo:Europe xnes:Kesem Nikkei cur:NIS geo:Israel isa:ManagementFee isa:BaseIndex 0 stocks:Nikkei 225 isa:BaseIndex tih:TachlitNikkei cur:Yen geo:Asia geo:Japan isa:BaseCurrency cur:state geo:continent isa:ManagementFee 0.65
SPARQL - a Query Language for RDF • SPARQL - Simple Protocol and RDF Query Language • Executes queries over RDF graph • Basic query structure - SELECT ... WHERE ... • Uses URIs • Data integration “integrated” in the language : • queries over multiple graphs • Query Language with web protocol • web service specification
SPARQL Example • Suppose we want to find all ETFs, which are based on NIS and have management fee less than 0.5 %. We want the query to return security numbers of the ETFs. • SPARQL query: PREFIX isa: <http://www.isa.gov.il/ETF#> PREFIX cur: < http://www.currency.com/#> SELECT ?etfNumber WHERE { ?etf isa:SecurityNumber ?etfNumber ; isa:BaseCurrency cur:NIS ; isa:ManagementFee ?fee . FILTER (?fee < 0.5) }
Additional SPARQL Example • Suppose we want to find all ETFs, which are based on currencies of Asian states. We want the query to return the security numbers of the ETFs and the currencies they are based on. • SPARQL query: PREFIX isa: <http://www.isa.gov.il/ETF#> PREFIX cur: <http://www.currency.com/#> PREFIX geo: <http://www.geoinfo.com/#”> SELECT ?etfNumber ?currency WHERE { ?etf isa:SecurityNumber ?etfNumber ; isa:BaseCurrency ?currency. ?currency cur:state ?state. ?state geo:continent geo:Asia. }
RDF: the problem • Suppose, we want to write a query about all the securities (ETFs, trust funds, stocks and bonds) which are based on Japan economy? • We need a some way for the query language to “understand” that ETFis aSecurity
The Next Step of the Semantic Web - Ontologies • Ontology in computer science - a formal representation of a set of concepts within a domain and the relationships between those concepts. (From http://en.wikipedia.org/wiki/Ontology_(computer_science)
The Next Step of the Semantic Web - Ontologies • For example, an ontology can state that any entity that has a property (the predicate in the triple subject-predicate-object) belongs to some class • the subject of isa:SecurityNumber is a member of isa:Security (the domainof the property) • the object of isa:BaseCurrency is cur:Currency (the rangeof the property)
The Next Step of the Semantic Web - Ontologies • Stating subclass relationships • similar to inheritance in OOP • For example • isa:ETF and isa:MutualFund is a subclass of isa:Security • isa:BaseCurrency and isa:BaseIndex a subproperty of isa:BasedOn
The Semantic Web Ontology Languages • There is a number of ontology languages developed for Semantic Web • are built on RDF • The standard ontology languages (from more primitive to more expressive) : • RDF Schema (RDFS) • OWL Lite • OWL DL • OWL Full • The more expressive the ontology language, the more computationally expensive is to process it
OWL • In OWL it is possible to state in addition to range, domain, subClass and subProperty information: • boolean combination of classes • e.g. Person is either Male or Female • disjointness of classes • e.g. Male and Female are disjoint classes
OWL • cardinality of the objects • e.g. every person has one mother, two parents, four grandparents • characteristics of properties • e.g. isSiblingOf is a symmetric and transitive property • and some other... • An example ontology - http://www.owldl.com/ontologies/family.owl
OWL • OWL is based on Description Logics • a formalism for representing concepts(classes), roles (properties) and relationships between them • has a formally defined semantics • several reasoning tools exist • less expressive than First Order Predicate Logic • a compromise between expressiveness and computational efficiency
OWL Reasoning Example • Suppose we have the following ontology: • domain of the property X hasSister Y is Person • range of the property X hasSister Y is Female • X hasSister Y is a sub property of X hasSibling Y • X hasSibling Y is a symmetric and transitive property
OWL Reasoning Example • From a fact“John”hasSister“Mary” and the ontology follows • “John” is a Person • “Mary” is a Female • “Mary” is asibling of “John” • “John” is asibling of “Mary”
OWL Reasoning Example • From an additional fact“Mary” hasSibling“Bob” • “John” hasSibling“Bob” • “Bob” hasSibling“Mary” • “Bob” hasSibling“John”
Description Logic - Limitations • It is impossible to define a rule for an “uncle”, as in First Order Predicate logic • Parent(X,Y), Brother(X,Z) → Uncle(Z,Y)
The Logic Reasoning in the Semantic Web • The logic reasoning is used in the context of the Semantic Web for: • checking the consistency of ontologies • checking the consistency of data with regard to the ontologies • automatic classification of instances of data • discovering implied relationships between instances of data
The Semantic Web: an Optimistic Scenario • All the data in the world is described by RDF or could be converted automatically to an RDF representation • semantic markup • converters to RDF
The Semantic Web: an Optimistic Scenario • All the data could be queried by one query language (e.g. SPARQL) • All the world knowledge about any domain could be formalized in ontologies which will appear on the Web • for example, ontologies about securities, currencies, geographic information etc.
The Semantic Web: an Optimistic Scenario* Software Applications SPARQL** engine ontologiesOWL SQL-SPARQL “bridge” RDF extractors/ converters RDF Data Documents- XML, HTML, Text... Databases * Influenced by http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html#(36) ** OWL enabled SPARQL or another query language
The Semantic Web: Applications • Data Integration • Knowledge Management • Semantic Search • Management of multimedia data • Ontology-oriented software development • and others...
The Semantic Web: Applications • Corporate Semantic Web - a high-impact technology according to Gartner 2006 • mainstream adoption in 5-10 years