200 likes | 354 Views
Ontology and Search on the Semantic WEB. SPINN PhD course PISA, September 2004 Patrizia Paggio. Outline. What is wrong with current search engines? The Semantic WEB Vision Semantic tagging Ontologies Practical assignment. What is wrong with current search engines?.
E N D
Ontology and Searchon the Semantic WEB SPINN PhD course PISA, September 2004 Patrizia Paggio
Outline • What is wrong with current search engines? • The Semantic WEB Vision • Semantic tagging • Ontologies • Practical assignment Pisa Sep 2004
What is wrong with current search engines? • Answer to a search is a list of texts, but: • Is it the right texts? • How about cross-lingual search? • Do we always need texts? Pisa Sep 2004
Is it the right texts? • Query:Rom i dansk maleri fra 1800-tallet • Rome in Danish painting from 1800 • Answer: • Dyreetiske Råd... Knudsen, Johannes Døden i Rom. ... Dansk kunst fra 1800-tallet i ... kunst, islamisk kunst fra ... Symbolismen i dansk og europæisk maleri ... • (Committee for animal ethics... Knudsen, Johannes Death in Rome... Danish art i the 19th century in... art, Islamic art from... symbolism in Danish and European painting...) Pisa Sep 2004
Is it the right texts? (2) • Syntax ignored: • no relation betweenRom and dansk maleriin the text. • Semantics ignored: • from malerione should be able to get to malere(painters), billeder(paintings) etc. Pisa Sep 2004
Cross-lingual search (1) • Query: courses on 1800 century painting • Answers: • University of Wisconsin-Madison • Center for European Studies (Univ. Wisconsin) • Rutgers, State University of New Jersey • University of Dublin • Enchanted Learning • Only English-speaking, mostly American. Pisa Sep 2004
Cross-lingual search (2) • Query: courses on 1800 century painting at European universities • Answers: • University of Wisconsin-Madison • Elon University, North Carolina • University College of London • Institute of Art History in Marburg • Cornell • Still mostly English speaking, especially American! Pisa Sep 2004
Do we always need texts? • Hvad er email-adresserne på alle forskere på KUA? • (All emails of researchers from Humanities) • We would like to see: What can we do now? web-page-ku.htm Pisa Sep 2004
Summary: search engines • More syntax and semantics • Extraction of knowledge from texts and databases rather than just text retrieval • Extraction of knowledge from texts on different languages Pisa Sep 2004
The Semantic WEB • A more ‘intelligent’ WEB in which information (documents and other data) are tagged so that computer application ‘understand’ them, or know how to use them. • “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” • (Berners-Lee, Hendler and Lassila 2001) Pisa Sep 2004
Semantic tagging (1) • HTML documents are tagged so that browsers understand their syntax and show them to users who interpret them. <h3> Patrizia Paggio </h3> <p> Employee at <a href="http://cst.dk"> CST </a> in Copenhagen</p> <p> Titel: Seniorforsker </p> <p> Email: patrizia@cst.dk </p> Pisa Sep 2004
Semantic tagging (2) • Semantic metadata make the content of WEB pages explicit. • hasName(SeniorResearcher, Patrizia Paggio) • hasName(ResearchCentre, CST) • Affiliated(Patrizia Paggio, CST) • subClass(SeniorResearcher, employee) • hasProperty(Patrizia Paggio, email:patrizia@cst.dk) • hasProperty(CST, url:http://cst.dk) Pisa Sep 2004
Defining semantic tags:Concept hierarchies Employee Administrative Faculty Researcher Assistent Professor Lecturer Senior Researcher Pisa Sep 2004
Affiliated arg1 arg2 ResearchCentre: CST SeniorResearcher: Patrizia Paggio Email URL patrizia@cst.dk http://cst.dk Adding relations Pisa Sep 2004
Ontologies – What are they? • A domain ontology defines concepts and relations between them for a specific domain. • It provides the semantic vocabulary for semantic tagging on the semantic WEB. • For domain ontologies to be useful, they must conform to standards (W3C). • Examples: XML, RDF, OIL, DAML-OIL, OWL, Topic Maps. Pisa Sep 2004
Ontologies - Repositories • DAML Ontology library: • 282 ontologies • total no. of classes 67987 • total no. properties 11149 • total no. of instances 43646 Pisa Sep 2004
Ontology-based querying • Hvad er email-adresserne på alle forskere på KUA? • will retrieve a list ofemails and researchers provided that: • relevant pages are tagged semantically; • current standards are used; • the agent (search engine) knows the underlying ontology or can find it. Pisa Sep 2004
Ontologies – Some issues • Some crucial issues: • How are ontologies produced? • How is the semantic tagging produced? • Will knowledge referring to the same domain always be described by the same ontology? What if it doesn’t? • How is knowledge expressed in different languages related to the same domain ontology? Pisa Sep 2004
Practical assignment (1) • Define ontologies for web pages in different languages. Find: • Classes: Organisation • Subclasses: Institute is-a Organisation • Instances: Person: Patrizia Paggio • Attributes: Email (Person) • Relations:Teach (Teacher, Course) Pisa Sep 2004
Practical assignment (2) • (2)Define correspondences between the ontologies. • Ontology I Relation Ontology II • Person Equivalent Person • Professor Similar Professor • (3) Discuss the principles followed to establish the correspondences. Pisa Sep 2004