750 likes | 938 Views
DataMining versus SemanticWeb. Veljko Milutinovic, vm@etf.bg.ac.yu http://galeb.etf.bg.ac.yu/vm. This material was developed with financial help of the WUSA fund of Austria. DataMining versus SemanticWeb. Two different avenues leading to the same goal!
E N D
DataMining versus SemanticWeb Veljko Milutinovic, vm@etf.bg.ac.yu http://galeb.etf.bg.ac.yu/vm This material was developed with financial help of the WUSA fund of Austria.
DataMining versus SemanticWeb • Two different avenues leading to the same goal! • The goal: Efficient retrieval of knowledge,from large compact or distributed databases, or the Internet • What is the knowledge: Synergistic interaction of information (data)and their relationships (correlations). • The major difference: Placement of complexity
Essence of DataMining • Data and knowledge representedwith simple mechanisms (typically, HTML)and without metadata (data about data). • Consequently, relatively complex algorithms have to be used (complexity migratedinto the retrieval request time). • In return,low complexity at system design time!
Essence of SemanticWeb • Data and knowledge representedwith complex mechanisms (typically XML)and with plenty of metadata (a byte of data may be accompanied with a megabyte of metadata). • Consequently, relatively simple algorithms can be used (low complexity at the retrieval request time). • However, large metadata designand maintenance complexityat system design time.
Major Knowledge Retrieval Algorithms (for DataMining) • Neural Networks • Decision Trees • Rule Induction • Memory Based Reasoning, • etc… • Consequently, the stress is on algorithms!
Major Metadata Handling Tools (SemanticWeb) • XML • RDF • Ontology Languages • Verification (Logic +Trust) Efforts in Progress • Consequently, the stress is on tools!
Semantic Web Tutorial Structure (Overview) • Introduction to the Semantic Web • XML Technologies for the Semantic Web • Defining vocabularies with RDF • Ontologies and ontology languages • Challenges for the Semantic Web • References
World Wide Web - Today Information consumer preferences Information request preferences Search Engines (eg. Google), Information Portals Indexing, refences, collections Information and Service Providers
S+ S+ Request/Task Interpretation Interpretation Agents Communication, Negotiation, Planning, Decisions, Proofs Interpretation Ratings, Signatures, Certificates S+ Interpretation S+ S+ S+ Semanticly enriched information S+ „Trust“-Services Semantic Web - Vision User Preferences … Calendar … Calendar Preferences Information and Service Provider
A Definition of the Semantic Web “Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
Why? • To use the large amount of information on the Web more effectively • To enable more advanced automated processing on the Web - machines can “understand” the content • Intelligent browsers to help you find what you are looking for • To derive new information from existing information (reasoning) - Virtual global database • Advanced applications and services become possible, e.g. in - e-business - e-government - e-learning
Examples • Context-awareness --linking based on the meaning of the information elements • Filtering -- you could rate the pages you visit, and this is later used for automatic general recommendations • Annotations -- you could add comments to the information on the Web, and these comments can be shown to other visitors • Privatization -- you can create your own database of information from the Web
Trusted Web Resources DAML+OIL Shared Terminology machine machine 2010 OWL XML Self Describing RDF Documents2000 HTTP Foundation of Web today1990 Human Machine HTML SGML Document Exchange Format1985 Hy Time
Building Blocks Semantic Web Metadata URI Data about data – labeling and structuring information in a document Universal Resource Identifier – an universal and unique name for any resource http://www.something.com/one
Minimalist Design • Making it as simple as possible • Simplicity helps future evolution of Semantic Web
Inference • Deriving new data from the existing ones • Merging data repositories gives new information • Allows the creation of more powerful applications (intelligent agents) • Unfortunately, inference can be achieved completely only when the semantics is defined formally in a language(e.g. "First Order Predicate Logic“ languages)
Tutorial Structure • Introduction to the Semantic Web • XML Technologies for the Semantic Web • Defining vocabularies with RDF • Ontologies and ontology languages • Challenges for the Semantic Web • References
XML Technologies for the Semantic Web • Overview • XML Instances • XML Document Type Definition • XML Linking • XML Schema • XML Query Language
What is an XML-Document ? <?xml version="1.0"?> <a> <bid="x1"> <c>David</c> <c>Marie</c> </b> <d/> <bid="x2"> <c>John</c> </b> </a> a a id=x1 id=x2 * b d b b d id c c c * c David Marie John Schema (Document Type Definition, DTD) File Format (Instance) Tree Structure Instance
The XML Stack Specific Applications Standardized Applications XHTML, SVG, SMIL, P3P, MathML Layout - XSL - CSS Hyperlinks - XLink - XPointer Metadata - RDF, RDFS API - DOM - SAX Schemas - XSD - Namespaces Queries - XPath - XQuery XML 1.0 Locators (URI) Unicode DTDs
Example of songs.xml • Example of describing a song in songs.xml using music.dtd • <song> <title>Gipsy song</title> <artist>Vlatko Stefanovski</artist> <type class=”ETHNO” /><download class=“YES”/><comments/></song> parent element defined in music.dtd child elements defined in music.dtd
Music.dtd Parent element • <!ELEMENT song (title, artist, album?, type, format?, download, comments?)> • <!ELEMENT title (#PCDATA)> • <!ELEMENT artist (#PCDATA)> • <!ELEMENT type EMPTY> • <!ATTLIST type • class (CLASSICAL | ROCK | POP | RAP | • JAZZ | TECHNO | ETHNO) #REQUIRED> • <!ELEMENT download EMPTY> • <!ATTLIST download • class (YES | NO) "YES" • > • <!ELEMENT comments (#PCDATA)> Child elements Attributes describe content List of values for download
XML Linking Simple Link Extended Link XPointer Link Group
XPath • A language that enables us to address parts of an XML document (elements, attributes, …) • Select the title elements of the song elements of the catalog element and all the artist elements in the document /catalog/song/title | //artist • Selects all the song elements of the catalog element that have a download element with a value of yes: /catalog/song[download=yes]/title selects any element in the document selects the child element selects several paths
Also… • Use * to select unknown XML elements /catalog/*/artist • Use @attribute_name to specify an attribute //song[@type=‘classical'] • XPath expressions – logical, arithmetical /catalog/song[duration<5] • XPath functions - count(), id(), last(), name(), concat(), string(), trenslate(), sum(), round(), false(), not(),… /catalog/song[last()] • To select nodes from the XML document (IE) xmlDoc.selectNodes("/catalog/song/title/text()") the path
XPointer • Locates portions of other XML documents (elements, attributes…), without the need to place anchors inside those documents (as in HTML) • More robust to the changes in the target document • URL + XPath • http://www.music.org/first.xml/#xpointer(//song/title[1]) URL of the document we point into XPointer expression (XPath language)
XML Schema • XML Schema defines a class of XML documents • Defines (explains) the datatypes, elements, and attributes • Defines and catalogues vocabularies for classes of XML documents • The document described by an XML schema can be called an instance (parallel to OOP) • The schema language, considerably extends the capabilities of XML 1.0 document type definitions (DTDs), most importantly with datatypes
Practically no reuseof contentmodels Syntax: Not XML Limitations of DTDs • <!ELEMENT song (title, artist, album?, type, format?, download, comments?)> • <!ELEMENT title (#PCDATA)> • <!ELEMENT artist (#PCDATA)> • <!ELEMENT type EMPTY> • <!ATTLIST type • class (CLASSICAL | ROCK | POP | RAP | JAZZ | TECHNO | ETHNO) #REQUIRED> • <!ELEMENT download EMPTY> • <!ATTLIST download • class (YES | NO) "YES"> • <!ELEMENT comments (#PCDATA)> Constructors: Elementset withContent Model Datentypes: Essentially only "String"
XML Schema Components • An XML Schema is comprised of a set of schema components • There are three groups of components • Primary components - Simple type definitions, Complex type definitions, Attribute declarations, Element declarations • Secondary components - Attribute group definitions, Identity-constraint definitions, Model group definitions, Notation declarations • “Helper” components – Annotations, Model groups, Particles, Wildcards, Attribute Uses
Example – song Type definition • <xsd:complexType name=“song" > <xsd:sequence> <xsd:element name=“title" type="xsd:string"/> <xsd:element name=“artist" type="xsd:string"/> </xsd:sequence> • <xsd:attribute name=“length" type="xsd:duration"/> </xsd:complexType> • xsd – used to denote XML Schema namespace Complex type <xsd:choice> Type declarations Simple type </xsd:choice>
Reusability of schemas • xs:include – to include a schema from another document (copy-paste) <xs:include schemaLocation=“collection.xsd"/> • xs:redefine – same, plus it lets you redefine schema • xs:import - reusing definitions from other namespaces (a system of libraries) <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="myxml.xsd"/> Now we can reference an external element from the imported namespace in our schema
Tutorial Structure • Introduction to the Semantic Web • XML Technologies for the Semantic Web • Defining vocabularies with RDF • Ontologies and ontology languages • Challenges for the Semantic Web • References
Defining vocabularies with RDF • Motivation for RDF • RDF Instances • Basic concepts and building blocks • Syntax options • Reification • Collections • RDF Schema: Defining your own Vocabularies • Supporting Interoperability with RDF
What do we NOT get from XML? • Superimposing (meta) information: • XML combines metainformation and content • Datatypes that we can „reason“ about: • Example:CLASSICAL | ROCK | POP | RAP | JAZZ | TECHNO | ETHNOis just a choice of allowed strings. We cannot represent that DIXIE is a subclass of JAZZ, BLUES overlaps with ROCK, ETHNO • Bottom up reuse of vocabularies • Independently evolved XML Schemas for one and the same thing • How do you model an „address“?
RDF: Defining Semantics on the Web • There is a need to describe resources on the Web in a form that can be interpreted by machines across the Web • Interpretation depends on the context of a resource eg. Jaguar (car vs. beast) • Using their experience and cognitive abilities humans may infer the context of a resource in many ways, even if it is not made explicit • Software can interpret context only if it is described explicitly and formally • RDF and the ontology languages building upon RDF provide means to explicate (part of) this context
RDF-Resource Description Framework • Defines a framework for structuring and describing resources like documents in the Semantic Web • Enables the definition of vocabularies for the description of resources in an application domain; • Goals: • Extensibility, interoperability, and reuse of vocabularies; • Improved support for interpretation of data by machines
The RDF Data Model • Simple but powerful datamodel for the description of resources and the creation of metadata • Consists of three core concepts: • Resource • Property • Statement + Class(in RDF Schema) • Similar to other modeling approaches (e.g. object-oriented modeling), but property-centric, not class-centric
RDF Statement and Graph • Each triple (S, P, O) node - arc - node represents an RDF statement Gipsy song is performed by Vlatko Stefanovski. subject (resource) object (resource or literal) predicate (property) http://www.music.org/songs/g/gipsySong http://www.artist.org/stefanovski Performed by Song represented by entry in a (fictive) song directory Artist represented by his homepage
Arcs in the RDF Graph An Arc • represents the predicate of an RDF statement • is labeled with a URI referring to an RDF property • is directed pointing from the subject of a statement to the object of a statement object subject predicate http://www.artist.org/stefanovski http://www.music.org/songs/g/gipsySong music:performed by
RDF Resource • The Resource forms the central concept in RDF • Anything that can be described can act as a resource • Web page, part of web page, web site, book, photograph, persons, … • Resources are identified by a resource identifier - URI (plus optional anchor IDs) • Compare for an entity (in the Entity Relationship model) or an object (in an object-oriented model)
RDF Property An RDF Property is used to express • A characteristic of an resource or • A binary relation between resources • A predicate in a statement • A property can be compared to a (binary) relationship among entities (in the Entity Relationship model)
Example • The individual whose name is Vlatko Stefanovski and whose email is V.Stefanovski@artists.org, is the artist of http://www.music.org/songs/g/gipsySong URI reference http://www.music.org/songs/g/gipsySong blank node music:artist node person:name person:homepage Vlatko Stefanovski http://www.artists.org/stefanovski literal
XML Serialization • How to translate the RDF graph structure into XML’s tree-oriented notation <rdf:Description rdf:about = “http://www.music.org/songs/g/gipsySong”> <music:performedby> <rdf:Description> <person:name>Vlatko Stefanovski</person:name> <person:homepage> <rdf:Description about = “http://www.artists.org/stefanovski”> </rdf:Description> </person:homepage> </rdf:Description> </music:performadby> </rdf:Description> http://www.music.org/songs/g/gipsySong music:performedby person:name person:homepage http://www.artists.org/stefanovski Vlatko Stefanovski
Reification • Latin: Res ... Thing -> Reification ... “Thing Making“ • Statements themselves can be considered as resources (things) in RDF. Thus, it is possible to make statements about statements (Reification). • Possible applications: • Definition of a context for a statement with respect to time, place, validity, …. • Embed a statements into a discourse (claims, doubts, proofs of statements) • … Example: Statement A: <sonata XY> <composer> <Mozart> Statement B: <music expert A> <claims> <statement A> <music expert C> <doubts> <statement A>
Reification Syntax • The statement to be reified has to be modeled as an RDF resource; • The RDF vocabulary provides special constructs for this purpose: • The class rdf:Statements which is the type of all RDF statements. • The property rdf:type which is used to associate an RDF resource with a class. • The property rdf:subject refers to the subject of the modeled statement (i.e. to the described resource) • The property rdf:predicate refers to the property used as a predicate in the modeled statement • The property rdf:object refers to the object of the modeled statement (i.e. the property value)
How to create a reified statement? • Associate the subject, predicate and object of the statement with the resource rdf: Statement This is done by using the rdf:subject, rdf:predicate and rdf:object properties; rdf:Statement rdf:type rdf:subject rdf:predicate rdf:object music:composer www.operas.org/Zauberflöte www.operas.org/Zauberflöte www.artists.org/Mozart www.artists.org/Mozart music:composer
How to create a reified statement? • Now the created node which represents the statements can be used as an object or subject of another RDF statement Statement becomes a resource www.musicExperts.org/ExpertA rdf:Statement music:claimedBy rdf:type rdf:subject rdf:predicate rdf:object music:composer www.operas.org/SonataXY www.artists.org/Mozart
XML Syntax for Reification <rdf:RDF xmlns:rdf = "http://w3.org/1999/02/22-rdf-syntax-ns#" xmlns:music="http://ipsi.fhg.de/music-schema#"> • <rdf:Description> <rdf:type resource=" http://w3.org/1999/02/22-rdf-syntax-ns#Statement” > <rdf:subject resource="http://www.operas.org/SonataXY " /> <rdf:predicate resource="http://ipsi.fhg.de/music-schema#Composer" /> <rdf:object resource = “http://www.artists.org/Mozart” /> <music:claimedBy resource = “http://www.musicExperts.org/ExpertA” /> </rdf:Description> </rdf:RDF> Property of the statement
RDF Collections • An RDF Container models a collection of resources. • The RDF model supports three types of containers: • Bag - an unordered list of resources or literals. • Sequence - An ordered list of resources or literals. • Alternative - A list of resources or literals that represent alternatives for the (single) value of a property. • Bag and Sequence can be used for multivalued properties