640 likes | 660 Views
The Elephant, the Blind Men, and the Semantic Web. Stefan Decker. stefan.decker@deri.org http://www.stefandecker.org/. Wikipedia….
E N D
The Elephant, the Blind Men, and the Semantic Web Stefan Decker stefan.decker@deri.org http://www.stefandecker.org/
Wikipedia… The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. It derives from World Wide Web Consortium director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange
“Ho! what have we here … … So very round and smooth and sharp? To me 'tis very clear This wonder of an Elephant Is very like a spear!”… • John Godfrey Saxe (1816-1887)“The Blind Men and the Elephant”
Evolution of the Web Knowledge Representation DataIntegration
Semantic Web as an Evolution of the Web: A Quick History of Collaboration and Personal Information Management Tools (and Visions)
Telephone (Bell 1876) This 'telephone' has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us.” Western Union internal memo, 1876.
Phonograph (Edison 1877) “The end of books”.
MemexPosited by Vannevar Bush in “As We May Think” The Atlantic Monthly, July 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Supports: Annotations, links between documents, and “trails” through the documents “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely”
Computer AidedCollaboration and Personal Information Management Tools (and Visions)
oNLine System- NLS, 1968(Doug Englebart, SRI) The Mouse; Word Processing; Data Sharing; Hypertext;
On the shoulders of giants… Memex (Vannevar Bush)A memex is “a device in which an individual stores all his books, records, and communications.” Augmenting Human Intellect(Doug Engelbart)“By "augmenting human intellect" we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems.” WWW (Tim Berners-Lee)“There was a second part of the dream […] we could then use computers to help us analyse it, make sense of what we re doing, where we individually fit in, and how we can better work together.”
It wasn’t the time then… Where are we now?
A Network of Knowledge… • Interconnected • Universal • All encompassing • Enable global and local collaboration • The right information for the right people at the right time
Hypothesis • Collaborative access to networked knowledge assists with collective problem solving • enabling innovation and increased productivity • individual, organisational and global levels Inspired by Doug Engelbart’s original 1962 report of: AUGMENTING HUMAN INTELLECT: A CONCEPTUAL FRAMEWORK
A Problem • Often people build databases in isolation, then want to share their data • Different systems within an enterprise • Different information brokers on the Web • Scientific collaborators • Researchers who want to publish their data for others to use • Even with normalization and the same needs, different people will arrive at different schemas • Goal of data integration: tie together different sources, controlled by many people, under a common schema
Virtual Integration Architecture Sources can be: relational, hierarchical (IMS), structure files, web sites.
Challenge: Sources Without a Well-Structured Schema Examples • semistructured • irregular • deeply nested • cross-referenced • incomplete schema knowledge • autonomous • dynamic • HTML pages • SGML documents • genome data • chemical structures • bibliographic information • results of the integration process
The Semistructured Data Model(e.g. Object Exchange Model) Bib &o1 complex object paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &96 1997 last firstname atomic object firstname lastname first lastname &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu”
Information Integration on the Web (MA-1) Research Projects (mid/late 1990/early 2000) • Garlic (IBM), • Information Manifold (AT&T) • Tsimmis, InfoMaster (Stanford) • The Internet Softbot/Razor/Tukwila(UW) • Hermes (Maryland) • DISCO (INRIA, France) • SIMS/Ariadne (USC/ISI) • Emerac/Havasu (ASU) • BibFinder (ASU)
Information Integration on the Web (MA-1) Many Techniques not Used for the Semantic Web • Local as View/Global as View • Wrapper/Mediator generation
Origins Tim Berners-Lee’s original 1989 WWW proposal described a web of relationships among namedobjects unifying many info. management tasks. Capsule history • Guha’s MCF (~94) • XML+MCF=>RDF (~96) • RDF+OO=>RDFS (~99) • RDFS+KR=>DAML+OIL (00) • W3C’s SW activity (01) • W3C’s OWL (03) http://www.w3.org/History/1989/proposal.html
What is an Ontology? • What is an Ontology? „An ontology is a specification of a conceptualization.“ Tom Gruber, 1993 • Ontologies are social contracts • Agreed, explicit semantics • Understandable to outsiders • (Often) derived in a community process • Vs. Database schema • Targeted towards physical data independence • Vs. XML Schema • Targeted towards document structure
RDF Schema (RDFS) • RDF Schema adds taxonomies forclasses & properties • subClass and subProperty • and some metadata. • domain and rangeconstraints on properties • Several widely usedKB tools can importand export in RDFS • Stanford Protégé KB editor • Java, open sourced • extensible, lots of plug-ins • provides reasoning & server capabilities
RDFS supports simple inferences New and Improved! 100% Betterthan XML!! • An RDF ontology plus some RDF statements may imply additional RDF statements. • This is not true of XML. • Note that this is part of the data model and not of the accessing or processing code. parent a property. person a class. woman subClass person. mother a property. eve a person; a woman; parent cain. cain a person. • @prefix rdfs: <http://www.....>. • @prefix : <genesis.n3>. • parent rdfs:domain person; • rdfs:range person. • mother rdfs:subProperty parent; • rdfs:domain woman; • rdfs:range person. • eve mother cain.
Problems with RDFS • RDFS too weak to describe resources in sufficient detail, e.g.: • No localised range and domain constraints Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants • No existence/cardinality constraints Can’t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents • No transitive, inverse or symmetrical properties Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical • We need RDF terms providing these and other features.
DAML+OIL = RDF + KR • DAML = Darpa Agent Markup Language • DARPA program with 17 projects & an integrator developing language spec, tools, applications for SW. • OIL = Ontology Inference Layer • An EU effort aimed at developing a layered approach to representing knowledge on the web. • Process • Joint Committee: US DAML and EU Semantic Web Technologies participants • DAML+OIL specs released in 2001 • See http://www.daml.org/ • Includes model theoretic and axiomatic semantics DAML+OIL