800 likes | 910 Views
COMS E6125 Web-enHanced Information Management (WHIM). Prof. Gail Kaiser Spring 2011. Today’s Topic:. Introduction to the Semantic Web RDF Ontologies. Simplicity is Good.
E N D
COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011 COMS 6125
Today’s Topic: • Introduction to theSemantic Web • RDF • Ontologies COMS 6125
Simplicity is Good • The World Wide Web contains huge amounts of information created by many different organizations, communities and individuals for many different reasons • Web users can easily access this information by specifying a known URL or using a search engine, and following links to find other related resources • This simplicity is a key aspect that made the Web so popular COMS 6125
Simplicity is Bad • The simplicity of the current Web has a price • It is very easy to get lost, or discover irrelevant or unrelated information • For instance, if we search for courses taught by a person named “Gail Kaiser”, we might find all kinds of other information • http://www.google.com/search?hl=&q=course+taught+by+gail+kaiser&sourceid=navclient-ff&rlz=1B3GGGL_enUS253US253&ie=UTF-8 • The problem is that the search engine does know what “courses” or “taught” means COMS 6125
name education CV work private Machine accessible meaning(What it’s like to be a machine) COMS 6125
So what does this mean? • What’s a “CV”? • What’s a “name”? • Etc. • Need semantics COMS 6125
What to do? • Develop enabling standards and technologies • to help machines understand more information on the Web • so that they can support richer discovery, data integration, navigation and automation of tasks COMS 6125
Add Metadata • Associate semantically rich, descriptive information with any resource • For instance, add metadata about teaching, so we can search for documents that have metadata specifying “Gail Kaiser” as a “teacher” (or “instructor”) COMS 6125
The Semantic Web • Provides a common framework that allows data to be shared and reused across application, enterprise and community boundaries • Not only provides URLs for documents, but to people, concepts and relationships • By giving unique identifiers to the person, the role “teacher” and the concept of “course”, we make very clear who the person is and the corresponding relation between this person and a particular document COMS 6125
What’s the difference? • Most Web content today is designed for humans to read, not for computer programs to manipulate meaningfully • Computers can adeptly parse Web pages for layout and routine processing—here a header, there a link to another page—but in general, computers have no reliable way to process the semantics • The Semantic Web brings structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can carry out sophisticated tasks for users COMS 6125
What’s the difference? The Semantic Web is not a separate web but an extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in co-operation.[Berners-Lee et al., 2001] COMS 6125
Wasn’t that what XML was supposed to do? • Yes and no • For the Semantic Web to function, computers must have access to structured collections of information and to sets of inference rules that they can use to conduct automated reasoning COMS 6125
Isn’t that just Knowledge Representation? • Traditional knowledge representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as “parent” or “vehicle” • But central control is stifling, and doesn’t scale • Which is why centralized hypertext link servers were abandoned for WWW COMS 6125
What about Web Services? • Web services are computational programs accessed using Web technologies • They may or may not operate on Web pages as data • But when they do, the semantics are implied by WSDL descriptions but basically hidden inside the code • There is no way for an arbitrary Web service or other program to “understand” the semantics of Web pages COMS 6125
Start with XML, not HTML HTML: <H1>WHIM</H1><UL> <LI>Instructor: Gail Kaiser <LI>Students: Donald Duck</UL> XML: <course date=“Spring 2011”><title>WHIM</title><instructor>Gail Kaiser</instructor><students>Donald Duck</students></course> COMS 6125
course title instructor students name http XML document = labeled tree • node = label + attr/values + contents <course date=“...”><title>...</title><instructor>...</instructor> <name>...</name> <http>...</http><students>...</students></course> = • XML Schema: grammars for describing legal trees and datatypes COMS 6125
Why not use XML Tags to represent Semantics? • Syntax: the structure of your data • Semantics: the meaning of your data • Two conditions necessary for interoperability: • Adopt a common syntax: enables applications to parse the data • Adopt a means for understanding the semantics: enables applications to use the data COMS 6125
XML and Semantics? <title> … <title> • But what does “title” mean? • If we ask google, we get (on the 1st page) • Boxing and martial arts equipment • Prefix or suffix added to person’s name • HTML tag • Women’s underwear • US Laws • Home purchase insurance • Library search COMS 6125
XML Limitations for Semantic Markup • XML makes no commitment on: Domain-specific vocabulary Modeling primitives • Requires pre-arranged agreement on & • Only feasible for closed collaboration • agents in a small & stable community • pages on a small & stable intranet • Not suited for sharing Web resources COMS 6125
< > < name > name <education> < > education < CV > < > CV <work> < > work <private> < > private XML machine accessible meaning COMS 6125
Beyond XML • XML lets everyone create their own tags • Scripts, or programs, can make use of these tags in sophisticated ways - but the programmer has to know what the page writer uses each tag for • XML allows users to add structure to their documents but says nothing about what the structures mean COMS 6125
Semantic Web Layers COMS 6125
Add RDF = Resource Description Framework • Encodes meaning in sets of triples - subject, predicate and object - analogous to the subject, verb and object of an elementary sentence • Makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page) • This structure can describe much of the data processed by machines COMS 6125
Example • Imagine that we want to state the fact that someone named Gail Kaiser wrote a particular Web page • A straightforward way to state this in English would be in the form of a simple statement such as: http://www.cs.columbia.edu/~kaiser/index.htmlhas an author whose value is Gail Kaiser COMS 6125
Making Statements about Resources • We need a way to identify the thing we want to describe (the Web page) • We need a way to identify a specific property (author) of the thing that we want to describe • We need a way to identify the thing we want to assign as the value of this property (who the author is), for the thing we want to describe COMS 6125
Making Statements about Resources • In the example, we used the Web page's URL (Uniform Resource Locator) to identify it - subject • We used the word “author” to identify the property we want to talk about - predicate • And the phrase “Gail Kaiser” to identify the thing (a person) we want to say is the value of this property - object COMS 6125
Many Statements can be made • We could state other properties of this Web page by writing additional English statements of the same general form http://www.cs.columbia.edu/~kaiser/index.html has a modification-date whose value is January 07, 2011 http://www.cs.columbia.edu/~kaiser/index.html has a size whose value is 18,985 bytes COMS 6125
But what do these Statements actually mean? • Subject and object can each be identified by a URL, just as used in a link on a Web page • The verbs – predicates – can also be identified by URLs, which enables anyone to define a new concept, a new predicate, just by defining a URL for it somewhere on the Web (a “Web resource”) • The URLs ensure that concepts are not just words in a document, but are tied to a unique definition that everyone can find on the Web COMS 6125
Web Resources • RDF is a language for representing information about resources on the World Wide Web • It is particularly intended for representing metadata about Web resources, such as the title, author, modification date and size of a Web page COMS 6125
Generalized Resources • By generalizing the concept of a “Web resource”, RDF can be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web • Examples include the author of the web page COMS 6125
Reconsider Example http://www.cs.columbia.edu/~kaiser/index.html has an author whose value is Gail Kaiser Neither the notion of a “author” nor Gail Kaiser can be retrieved from the Web Thus we need URIs in addition to URLs COMS 6125
Concept Graphs • RDF is based on the idea of identifying things using URIs • And describing resources (subjects) in terms of simple properties (verbs or predicates) and property values (objects) • This enables RDF to represent related concepts as a graph of nodes and arcs representing the resources, their properties and values COMS 6125
http://bank.cs.columbia.edu/classes/cs6125/ site-owner W3C <rdf:Descriptionrdf:about=“#Kaiser”> <email>kaiser+6125@...</email> </rdf:Description> site-owner describes http://www.w3.org/RDF Concept Graph Example • XML syntax • Chained triples form a graph kaiser+6125@... email Kaiser COMS 6125
Information Exchange • RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning • The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created • Application designers can leverage the availability of common RDF parsers and processing tools • RDF is written in XML format further leveraging XML tools and experience COMS 6125
What is RDF (again) ? • RDF is a data model • the model is domain-neutral and application-neutral • the model can be viewed as directed, labeled graphs or as an object-oriented model (object/attribute/value) • RDF data model is an abstract, conceptual layer independent of XML • consequently, XML is a transfer syntax for RDF, not a component of RDF • RDF data might never occur in XML form COMS 6125
RDF Model • RDF “statements” consist of • resources (= nodes)which have propertieswhich have values (= nodes,strings) = subject= predicate= object COMS 6125
value resource property http://www.w3.org/TR/REC-rdf-syntax/ editor “Dave Beckett” RDF Model “http://www.w3.org/TR/REC-rdf-syntax/ has the editor Dave Beckett” COMS 6125
RDF Model Example “W3C” dc:Publisher http://www.w3.org/TR/REC-rdf-syntax/ dc:Creator dc:Date “Dave Beckett” “2004-02-10” COMS 6125
Complex Values • So far, values of properties have been strings • A graph node (corresponding to a resource) also can be the value of a property • arbitrarily complex tree and graph structures are possible • syntactically, values can be embedded (i.e., lexically in-line) or referenced (linked) COMS 6125
http://www.w3.org/TR/REC-rdf-syntax/ dc:Creator “Dave Beckett” p:Name p:EMail “mailto:dave@dajobe.org” Complex Values COMS 6125
http://www.w3.org/TR/REC-rdf-syntax/ dc:Creator “Dave Beckett” p:Name p:EMail “mailto:dave@dajobe.org” Complex Values • Corresponding triples • { “http://www.w3.org/TR/REC-rdf-syntax/”, dc:Creator, x } • { x, p:Name, “Dave Beckett” } • { x, p:EMail, “dave@dajobe.org” } COMS 6125
Containers • Containers are collections - allow grouping of resources (or literal values) • It is possible to make statements about the container (as a whole) or about its members individually Different types of containers • bag - unordered collection • seq - ordered collection (= “sequence”) • alt - represents alternatives • It is possible to create collections based on URI patterns – e.g., all files in a particular web site • Duplicate values are permitted - no mechanism to enforce unique value constraints COMS 6125
http://www.w3.org/TR/REC-rdf-syntax dc:Creator rdf:Type rdf:Seq rdf:_1 rdf:_2 “Dave Beckett” “Brian McBride” Containers COMS 6125
Higher-order Statements • One can make RDF statements about other RDF statements • Example: “The Library of Congress affiliates Dave Beckett as the author of the RDF Syntax spec” • Allow us to express beliefs (and other modalities) • Important for trust models, digital signatures, etc. • Constitute metadata about metadata • Represented by modeling RDF in RDF itself COMS 6125
dc:Creator http://www.w3.org/TR/REC-rdf-syntax “Dave Beckett” dc:Creator “Library of Congress” Reification • The dotted box corresponds to the following statements • { x,rdf:predicate, “dc:creator” } • { x, rdf:subject, “http://www.w3.org/TR/REC-rdf-syntax } • { x, rdf:object, “Dave Beckett” } • { x, rdf:type, “rdf:statement” } COMS 6125
Reification • Reification allows a computer to process an abstraction as if it were any other datum • RDF is not really second-order • But it does provide a built-in predicate vocabulary for reification COMS 6125
<rdf:Description rdf:about=“#NYT”> <claims> <rdf:Description rdf:about=“#pers05”> <authorOf>ISBN...</authorOf> </rdf:Description> </claims> </rdf:Description> Author-of pers05 ISBN... Reification • Any statement can be an object (graphs can be nested) claims NYT COMS 6125
RDF Schema • Defines small vocabulary for RDF: • Class, subClassOf, type • Property, subPropertyOf • domain, range • Organizes this vocabulary in a typed hierarchy • Vocabulary can be used to define other vocabularies for your application domain Person subClassOf subClassOf range domain Student Researcher hasSuperVisor type type Swap Gail hasSuperVisor
RDF Schema syntax in XML <rdf:Description ID="MotorVehicle"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="http://www.w3.org/...#Resource"/> </rdf:Description> <rdf:Description ID="Truck"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Description> <rdf:Description ID="registeredTo"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:domain rdf:resource="#MotorVehicle"/> <rdf:range rdf:resource="#Person"/> </rdf:Description> <rdf:Description ID=”ownedBy"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:subPropertyOf rdf:resource="#registeredTo"/> </rdf:Description> COMS 6125