1 / 80

COMS E6125 Web-enHanced Information Management (WHIM)

COMS E6125 Web-enHanced Information Management (WHIM). Prof. Gail Kaiser Spring 2011. Today’s Topic:. Introduction to the Semantic Web RDF Ontologies. Simplicity is Good.

Download Presentation

COMS E6125 Web-enHanced Information Management (WHIM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011 COMS 6125

  2. Today’s Topic: • Introduction to theSemantic Web • RDF • Ontologies COMS 6125

  3. Simplicity is Good • The World Wide Web contains huge amounts of information created by many different organizations, communities and individuals for many different reasons • Web users can easily access this information by specifying a known URL or using a search engine, and following links to find other related resources • This simplicity is a key aspect that made the Web so popular COMS 6125

  4. Simplicity is Bad • The simplicity of the current Web has a price • It is very easy to get lost, or discover irrelevant or unrelated information • For instance, if we search for courses taught by a person named “Gail Kaiser”, we might find all kinds of other information • http://www.google.com/search?hl=&q=course+taught+by+gail+kaiser&sourceid=navclient-ff&rlz=1B3GGGL_enUS253US253&ie=UTF-8 • The problem is that the search engine does know what “courses” or “taught” means COMS 6125

  5. name education CV work private Machine accessible meaning(What it’s like to be a machine) COMS 6125

  6. So what does this mean? • What’s a “CV”? • What’s a “name”? • Etc. • Need semantics COMS 6125

  7. What to do? • Develop enabling standards and technologies • to help machines understand more information on the Web • so that they can support richer discovery, data integration, navigation and automation of tasks COMS 6125

  8. Add Metadata • Associate semantically rich, descriptive information with any resource • For instance, add metadata about teaching, so we can search for documents that have metadata specifying “Gail Kaiser” as a “teacher” (or “instructor”) COMS 6125

  9. The Semantic Web • Provides a common framework that allows data to be shared and reused across application, enterprise and community boundaries • Not only provides URLs for documents, but to people, concepts and relationships • By giving unique identifiers to the person, the role “teacher” and the concept of “course”, we make very clear who the person is and the corresponding relation between this person and a particular document COMS 6125

  10. What’s the difference? • Most Web content today is designed for humans to read, not for computer programs to manipulate meaningfully • Computers can adeptly parse Web pages for layout and routine processing—here a header, there a link to another page—but in general, computers have no reliable way to process the semantics • The Semantic Web brings structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can carry out sophisticated tasks for users COMS 6125

  11. What’s the difference? The Semantic Web is not a separate web but an extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in co-operation.[Berners-Lee et al., 2001] COMS 6125

  12. Wasn’t that what XML was supposed to do? • Yes and no • For the Semantic Web to function, computers must have access to structured collections of information and to sets of inference rules that they can use to conduct automated reasoning COMS 6125

  13. Isn’t that just Knowledge Representation? • Traditional knowledge representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as “parent” or “vehicle” • But central control is stifling, and doesn’t scale • Which is why centralized hypertext link servers were abandoned for WWW COMS 6125

  14. What about Web Services? • Web services are computational programs accessed using Web technologies • They may or may not operate on Web pages as data • But when they do, the semantics are implied by WSDL descriptions but basically hidden inside the code • There is no way for an arbitrary Web service or other program to “understand” the semantics of Web pages COMS 6125

  15. Semantic Web Layers(T. Berners-Lee)

  16. Start with XML, not HTML HTML: <H1>WHIM</H1><UL> <LI>Instructor: Gail Kaiser <LI>Students: Donald Duck</UL> XML: <course date=“Spring 2011”><title>WHIM</title><instructor>Gail Kaiser</instructor><students>Donald Duck</students></course> COMS 6125

  17. course title instructor students name http XML document = labeled tree • node = label + attr/values + contents <course date=“...”><title>...</title><instructor>...</instructor> <name>...</name> <http>...</http><students>...</students></course> = • XML Schema: grammars for describing legal trees and datatypes COMS 6125

  18. Why not use XML Tags to represent Semantics? • Syntax: the structure of your data • Semantics: the meaning of your data • Two conditions necessary for interoperability: • Adopt a common syntax: enables applications to parse the data • Adopt a means for understanding the semantics: enables applications to use the data COMS 6125

  19. XML and Semantics? <title> … <title> • But what does “title” mean? • If we ask google, we get (on the 1st page) • Boxing and martial arts equipment • Prefix or suffix added to person’s name • HTML tag • Women’s underwear • US Laws • Home purchase insurance • Library search COMS 6125

  20. XML Limitations for Semantic Markup • XML makes no commitment on:  Domain-specific vocabulary  Modeling primitives • Requires pre-arranged agreement on  &  • Only feasible for closed collaboration • agents in a small & stable community • pages on a small & stable intranet • Not suited for sharing Web resources COMS 6125

  21. < > < name > name <education> < > education < CV > < > CV <work> < > work <private> < > private XML machine accessible meaning COMS 6125

  22. Beyond XML • XML lets everyone create their own tags • Scripts, or programs, can make use of these tags in sophisticated ways - but the programmer has to know what the page writer uses each tag for • XML allows users to add structure to their documents but says nothing about what the structures mean COMS 6125

  23. Semantic Web Layers COMS 6125

  24. Add RDF = Resource Description Framework • Encodes meaning in sets of triples - subject, predicate and object - analogous to the subject, verb and object of an elementary sentence • Makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page) • This structure can describe much of the data processed by machines COMS 6125

  25. Example • Imagine that we want to state the fact that someone named Gail Kaiser wrote a particular Web page • A straightforward way to state this in English would be in the form of a simple statement such as: http://www.cs.columbia.edu/~kaiser/index.htmlhas an author whose value is Gail Kaiser COMS 6125

  26. Making Statements about Resources • We need a way to identify the thing we want to describe (the Web page) • We need a way to identify a specific property (author) of the thing that we want to describe • We need a way to identify the thing we want to assign as the value of this property (who the author is), for the thing we want to describe COMS 6125

  27. Making Statements about Resources • In the example, we used the Web page's URL (Uniform Resource Locator) to identify it - subject • We used the word “author” to identify the property we want to talk about - predicate • And the phrase “Gail Kaiser” to identify the thing (a person) we want to say is the value of this property - object COMS 6125

  28. Many Statements can be made • We could state other properties of this Web page by writing additional English statements of the same general form http://www.cs.columbia.edu/~kaiser/index.html has a modification-date whose value is January 07, 2011 http://www.cs.columbia.edu/~kaiser/index.html has a size whose value is 18,985 bytes COMS 6125

  29. But what do these Statements actually mean? • Subject and object can each be identified by a URL, just as used in a link on a Web page • The verbs – predicates – can also be identified by URLs, which enables anyone to define a new concept, a new predicate, just by defining a URL for it somewhere on the Web (a “Web resource”) • The URLs ensure that concepts are not just words in a document, but are tied to a unique definition that everyone can find on the Web COMS 6125

  30. Web Resources • RDF is a language for representing information about resources on the World Wide Web • It is particularly intended for representing metadata about Web resources, such as the title, author, modification date and size of a Web page COMS 6125

  31. Generalized Resources • By generalizing the concept of a “Web resource”, RDF can be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web • Examples include the author of the web page COMS 6125

  32. Reconsider Example http://www.cs.columbia.edu/~kaiser/index.html has an author whose value is Gail Kaiser Neither the notion of a “author” nor Gail Kaiser can be retrieved from the Web Thus we need URIs in addition to URLs COMS 6125

  33. Concept Graphs • RDF is based on the idea of identifying things using URIs • And describing resources (subjects) in terms of simple properties (verbs or predicates) and property values (objects) • This enables RDF to represent related concepts as a graph of nodes and arcs representing the resources, their properties and values COMS 6125

  34. http://bank.cs.columbia.edu/classes/cs6125/ site-owner W3C <rdf:Descriptionrdf:about=“#Kaiser”> <email>kaiser+6125@...</email> </rdf:Description> site-owner describes http://www.w3.org/RDF Concept Graph Example • XML syntax • Chained triples form a graph kaiser+6125@... email Kaiser COMS 6125

  35. Information Exchange • RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning • The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created • Application designers can leverage the availability of common RDF parsers and processing tools • RDF is written in XML format further leveraging XML tools and experience COMS 6125

  36. What is RDF (again) ? • RDF is a data model • the model is domain-neutral and application-neutral • the model can be viewed as directed, labeled graphs or as an object-oriented model (object/attribute/value) • RDF data model is an abstract, conceptual layer independent of XML • consequently, XML is a transfer syntax for RDF, not a component of RDF • RDF data might never occur in XML form COMS 6125

  37. RDF Model • RDF “statements” consist of • resources (= nodes)which have propertieswhich have values (= nodes,strings) = subject= predicate= object COMS 6125

  38. value resource property http://www.w3.org/TR/REC-rdf-syntax/ editor “Dave Beckett” RDF Model “http://www.w3.org/TR/REC-rdf-syntax/ has the editor Dave Beckett” COMS 6125

  39. RDF Model Example “W3C” dc:Publisher http://www.w3.org/TR/REC-rdf-syntax/ dc:Creator dc:Date “Dave Beckett” “2004-02-10” COMS 6125

  40. Complex Values • So far, values of properties have been strings • A graph node (corresponding to a resource) also can be the value of a property • arbitrarily complex tree and graph structures are possible • syntactically, values can be embedded (i.e., lexically in-line) or referenced (linked) COMS 6125

  41. http://www.w3.org/TR/REC-rdf-syntax/ dc:Creator “Dave Beckett” p:Name p:EMail “mailto:dave@dajobe.org” Complex Values COMS 6125

  42. http://www.w3.org/TR/REC-rdf-syntax/ dc:Creator “Dave Beckett” p:Name p:EMail “mailto:dave@dajobe.org” Complex Values • Corresponding triples • { “http://www.w3.org/TR/REC-rdf-syntax/”, dc:Creator, x } • { x, p:Name, “Dave Beckett” } • { x, p:EMail, “dave@dajobe.org” } COMS 6125

  43. Containers • Containers are collections - allow grouping of resources (or literal values) • It is possible to make statements about the container (as a whole) or about its members individually Different types of containers • bag - unordered collection • seq - ordered collection (= “sequence”) • alt - represents alternatives • It is possible to create collections based on URI patterns – e.g., all files in a particular web site • Duplicate values are permitted - no mechanism to enforce unique value constraints COMS 6125

  44. http://www.w3.org/TR/REC-rdf-syntax dc:Creator rdf:Type rdf:Seq rdf:_1 rdf:_2 “Dave Beckett” “Brian McBride” Containers COMS 6125

  45. Higher-order Statements • One can make RDF statements about other RDF statements • Example: “The Library of Congress affiliates Dave Beckett as the author of the RDF Syntax spec” • Allow us to express beliefs (and other modalities) • Important for trust models, digital signatures, etc. • Constitute metadata about metadata • Represented by modeling RDF in RDF itself COMS 6125

  46. dc:Creator http://www.w3.org/TR/REC-rdf-syntax “Dave Beckett” dc:Creator “Library of Congress” Reification • The dotted box corresponds to the following statements • { x,rdf:predicate, “dc:creator” } • { x, rdf:subject, “http://www.w3.org/TR/REC-rdf-syntax } • { x, rdf:object, “Dave Beckett” } • { x, rdf:type, “rdf:statement” } COMS 6125

  47. Reification • Reification allows a computer to process an abstraction as if it were any other datum • RDF is not really second-order • But it does provide a built-in predicate vocabulary for reification COMS 6125

  48. <rdf:Description rdf:about=“#NYT”> <claims> <rdf:Description rdf:about=“#pers05”> <authorOf>ISBN...</authorOf> </rdf:Description> </claims> </rdf:Description> Author-of pers05 ISBN... Reification • Any statement can be an object (graphs can be nested) claims NYT COMS 6125

  49. RDF Schema • Defines small vocabulary for RDF: • Class, subClassOf, type • Property, subPropertyOf • domain, range • Organizes this vocabulary in a typed hierarchy • Vocabulary can be used to define other vocabularies for your application domain Person subClassOf subClassOf range domain Student Researcher hasSuperVisor type type Swap Gail hasSuperVisor

  50. RDF Schema syntax in XML <rdf:Description ID="MotorVehicle"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="http://www.w3.org/...#Resource"/> </rdf:Description> <rdf:Description ID="Truck"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Description> <rdf:Description ID="registeredTo"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:domain rdf:resource="#MotorVehicle"/> <rdf:range rdf:resource="#Person"/> </rdf:Description> <rdf:Description ID=”ownedBy"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:subPropertyOf rdf:resource="#registeredTo"/> </rdf:Description> COMS 6125

More Related