1 / 69

The Semantic Web

The Semantic Web. Barry Smith http://ontologist.com. The problem of ontology. human beings can integrate highly heterogeneous information. Consider how the human mind. copes with complex phenomena in the social realm (e.g. speech acts of promising) which involve:

Download Presentation

The Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Semantic Web Barry Smith http://ontologist.com

  2. The problem of ontology • human beings can integrate highly heterogeneous information

  3. Consider how the human mind • copes with complex phenomena in the social realm (e.g. speech acts of promising) • which involve: • experiences (speaking, perceiving), • intentions, • language, • action (and tendencies to action), • deontic powers, obligations, claims, authority … • background habits, • mental competences, • records and representations

  4. understanding how computers can effect the same sort of integration is a difficult problem

  5. A new silver bullet

  6. The Semantic Web • designed to integrate the vast amounts of heterogeneous online data and services • via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems

  7. Tim Berners-Lee, inventor of the internet • ‘sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’

  8. hyperlinked vocabularies, called ‘ontologies’ will be used by Web authors • ‘to explicitly define their words and concepts as they post their stuff online. • ‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’

  9. Exploiting tools such as: • XML • OWL (Ontology Web Language) • RDF (Resource Descriptor Framework) • DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer) • (? confusing syntactic integration with semantic integration)

  10. University Ontology

  11. University Ontology Relations

  12. University Ontology Relations

  13. Defining ‘gene’ • GDB: a gene is a DNA fragment that can be transcribed and translated into a protein • Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype

  14. Example: The Enterprise Ontology • A Sale is an agreement between two Legal-Entities for the exchange of a Product for a Sale-Price. • A Strategy is a Plan to Achieve a high-level Purpose. • A Market is all Sales and Potential Sales within a scope of interest.

  15. Example: Statements of Accounts • Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards • These allocate cost items to different categories depending on the laws of the countries involved.

  16. Job: • to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems. • Not even this relatively simple problem has been satisfactorily resolved • … why not? • Because the very same terms mean different things • and are applied in different ways • in different cultures

  17. Verizon • The promise of Web Services, augmented with the Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. • The Web Services and Semantic Web trends are heading for a major failure (i.e., the most recent Silver Bullet). • In reality, Web Services, as a technology, is in its infancy. ... There is no technical solution (i.e., no basis) other than fantasy for the rest of the Web Services story. • Analyst claims of maturity and adoption (...) are already false. ... • Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI. Dr. Michael L. Brodie, Chief Scientist, Verizon ITOntoWeb Meeting, Innsbruck, December 16-18, 2002

  18. Assumptions • Communication / compatibility problems should be solved automatically • (by machine) • Hence ontologies must be applications running in real time

  19. Application ontology: • Ontologies are inside the computer • thus subject to severe constraints on expressive power • (effectively the expressive power of Description Logic)

  20. The Semantic Web Initiative • The Web is a vast edifice of heterogeneous data sources • Needs the ability to query and integrate across different conceptual systems

  21. How resolve incompatibilities? • enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which • 1. satisfy the constraints of a description logic (DL) • 2. are applied as meta-tags to the content of websites

  22. Clay Shirky • The Semantic Web is a machine for creating syllogisms. • Humans are mortalGreeks are humanTherefore, Greeks are mortal

  23. Lewis Carroll • - No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soap-bubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soap-bubbles • Therefore: All your poems are bad.

  24. the promise of the Semantic Web • it will improve all the areas of your life where you currently use syllogisms

  25. most of the data we use is not amenable to recombination in syllogistic form • because it is partial, inconclusive, context-sensitive • So we guess, extrapolate, intuit, we do what we did last time, we do what we think our friends would do … but we almost never use syllogistic logic.

  26. We Describe the World in Generalities • People who live in Brooklyn speak with a Brooklyn accent • People who live in France speak French

  27. Merging Databases • Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/] • Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web

  28. XML-syntax does not help • <BUSINESS-CARD>  <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

  29. and with correct XML-syntax: • <BUSINESS-CARD>  <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17 </STREET>

  30. Is "Jules" the first name of the person, or of the business-card? and with correct XML-syntax: • <BUSINESS-CARD>  <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

  31. and with correct XML-syntax: Is Jules or Newco the member of XTC Group? • <BUSINESS-CARD>  <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

  32. and with correct XML-syntax: • <BUSINESS-CARD>  <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD> Do the phone numbers and address belong to Jules or to the business?

  33. Metadata: the new Silver Bullet • agree on a metadata standard for washing machines as concerns size, price, etc. • create machine-readable databases and put them on the net •  consumers can query multiple sites simultaneously • and search for highly specific, reliable, context-sensitive results

  34. Shirkey: • The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world …

  35. Shirkey • Much of the proposed value of the Semantic Web is coming, but it is not coming because of the Semantic Web. The amount of meta-data we generate is increasing dramatically, and it is being exposed for consumption by machines as well as, or instead of, people. But it is being designed a bit at a time, out of self-interest and without regard for global ontology.

  36. Semantic Web effort • thus far devoted primarily to developing systems for standardized representation of web pages and web processes • (= ontology of web typography) • not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages

  37. Cory Doctorow • A world of exhaustive, reliable metadata would be a utopia.

  38. Problem 1: People lie • Meta-utopia is a world of reliable metadata. • But poisoning the well can confer benefits to the poisoners • Metadata exists in a competitive world. • Some people are crooks. • Some people are cranks. • Some people are French philosophers.

  39. Practical problems • of the semantic web: • who will police the coding?

  40. Problem 2: People are lazy • Half the pages on Geocities are called “Please title this page”

  41. Problem 3: People are stupid • The vast majority of the Internet's users • (even those who are native speakers of English) • cannot spell or punctuate • Will internet users learn to accurately tag their information with whatever DL-hierarchy they're supposed to be using?

  42. Problem 4: Multiple descriptions • “Requiring everyone to use the same vocabulary denudes the cognitive landscape, enforces homogeneity in ideas.” • (Cary Doctorow)

  43. Problem 5: Ontology Impedance • = semantic mismatch between ontologies being merged • This problem recognized in Semantic Web literature: • http://ontoweb.aifb.uni-karlsruhe.de • /About/Deliverables/ontoweb-del-7.6-swws1.pdf

  44. Solution 1:treat it as (inevitable) ‘impedance’ • and learn to find ways to cope with the disturbance which it brings • Suggested here: • http://ontoweb.aifb.uni-karls-ruhe.de/Ab-out/Deliverables/ontoweb-del-7.6-swws1.pdf

  45. Solution 2: resolve the impedance problem on a case-by-case basis • Suppose two databases are put on the web. • Someone notices that "where" in the friends table and "zip" in the places table mean the same thing. • http://www.w3.org/DesignIssues/Semantic.html

  46. We can use the Semantic Webto prove that Joe loves Mary • we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site. To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/]

  47. Both solutions fail • 1. treating mismatches as ‘impedance’ ignores the problem of error propagation • (and is inappropriate in an area like medicine) • 2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web

  48. Clinicians • often do not use category systems at all – they use unstructured text • from which usable data has to be extracted in a further step • Why? • Because every case is different, much patient data is context-dependent

  49. Problem 5: Ontology Impedance • = semantic mismatch between ontologies • ‘gene’ used in websites issued by • biotech companies involved in gene patenting • medical researchers interested in role of genes in predisposition to smoking • insurance companies

  50. Other problems with DL-based ontologies • DL poor when dealing with context-dependent information/usages of termse.g. Severe Acute Respiratory Syndrome • and when it comes to dealing with time • and when it comes to dealing with information about instances(rather than concepts or classes)

More Related