540 likes | 953 Views
Information Retrieval on the Semantic Web Using Ontology-based Visualization . Larry Reeve INFO780 – XML and Databases Dr. Han - Spring 2004. Overview. Semantic Web and Ontologies RDF and OWL Visualization Uses Cluster Map Futures. Semantic Web. Machine-processable Web
E N D
Information Retrieval on the Semantic Web Using Ontology-based Visualization Larry Reeve INFO780 – XML and Databases Dr. Han - Spring 2004
Overview • Semantic Web and Ontologies • RDF and OWL • Visualization Uses • Cluster Map • Futures
Semantic Web • Machine-processable Web • How to model meaning? • Common framework that allows data to be shared and reused • Extension of current web • Funding • DARPA - $70 million • European Union - € 55 million
Existing Web • Resources: • identified by URI's • untyped • Links: • href, src, ... • limited, non-descriptive • User: • Semantics of resource gleaned from content • Machine: • Little information available - significance of the links only evident from the context around the anchor. Source: W3C
Semantic Web • Resources: • Globally Identified by URI's • or Locally scoped (Blank) • Extensible • Relational • Links: • Identified by URI's • Extensible • Relational • User: • Richer user experience • Exchange knowledge effectively • Machine: • More processable information is available Source: W3C
Semantic Web Architecture Source: W3C
Ontology • Specification of a conceptualization (Gruber) • Provide common definition of a domain • Documents annotated with metadata to determine “meaning”
Ontology • Play central role in Semantic Web • Used for: • Querying • Presentation • Navigation • Move from keyword-based searching to logic-based searching
Ontology Types • Lightweight • Simple keyword hierarchies • (Yahoo, Open Directory Project) • Well-defined • Complex concept hierarchies, properties, value restrictions, axiomatised relationships
Ontology • Many ontologies currently defined: • DAML – DARPA Agent Markup Language (www.daml.org) • DAML Ontology Library – 282 entries • Baseball Teams • GPS coordinate systems • Employment hierarchy for CMU • Stanford • OntoLingua Server (www-ksl-svc.stanford.edu) • Protégé Ontologies Library (protege.stanford.edu)
W3C Standards • RDF – Resource Description Framework • data model for representing resources and their relations between them • OWL – Web Ontology Language • provides a vocabulary for describing properties and classes and allows for greater expressive complexity than RDF alone • Both recommendations issued Feb 2004
RDF • Represented using XML • An RDF statement is a triple composed of a subject, a predicate, and an object • Each RDF statement is modeled as a graph structure : • subjects and objects are nodes • predicate is an arc • Example: • index.html has a creator whose value is John Smith • subject(“index.html”) predicate(“creator”) object(“John Smith”) • Helpful in IR by providing more details to a search engine other than keywords
RDF Fragment • <RDF xmlns:r="http://www.w3.org/TR/RDF/" • xmlns:d="http://purl.org/dc/elements/1.0/" • xmlns="http://directory.mozilla.org/rdf"> • <Topic r:id="Top"> • <tag catid="1"/> • <d:Title>Top</d:Title> • <narrow r:resource="Top/Arts"/> • <narrow r:resource="Top/Business"/> • <narrow r:resource="Top/Computers"/> • <narrow r:resource="Top/Games"/> • <narrow r:resource="Top/Health"/> • </Topic> • </RDF> Source: Open Directory Project (www.dmoz.org)
OWL • Considered an extension of RDF • The vocabulary provided by OWL describes items such as: • relations between classes • cardinality • equality • richer typing of properties • characteristics of properties • enumerated classes • Comprised of three languages: • OWL Lite for building classification hierarchies and simple constraints • OWL Description Logics • OWL Full
OWL • OWL Lite: • for building classification hierarchies and simple constraints • OWL Description Logics (DL) • provides all OWL features in addition to computational completeness (guaranteed computability of conclusions) as well as decidability (all computations will finish in finite time) • OWL Full • provides all OWL features with no computational guarantees
OWL Fragment • <rdf:RDF • xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" • xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" • xmlns:owl="http://www.w3.org/2002/07/owl#" • xmlns:first="http://www.w3.org/2002/03owlt/Ontology/premises001#" • xml:base="http://www.w3.org/2002/03owlt/Ontology/premises001" > • <owl:Ontology rdf:about="" /> • <owl:Class rdf:ID="Car"> • <owl:equivalentClass> • <owl:Class rdf:ID="Automobile"/> • </owl:equivalentClass> • </owl:Class> • <first:Car rdf:ID="car"> • <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing" /> • </first:Car> • <first:Automobile rdf:ID="auto"> • <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing" /> • </first:Automobile> • </rdf:RDF> Source: W3C
Ontology-based IViz • Ontology Life Cycle • Development • IsAViz, Protégé • Instantiation • Manual, semi-automatic • Deployment • Analyze, query, and navigate an ontology-based information space
Ontology-based IViz • Ontology Characteristics • Light-weight • (Taxonomies with few logical class relations) • Large number of instances • Instance overlaps between classes • Incomplete
IViz in Deployment Stage • Analysis Visualization • Overview; pattern detection • Requires: data set, ontology, classifier • Query Visualization • Use ontology in query construction • Query Navigation • Information spaces / result sets
Analysis Visualization • Requires: data set, ontology, classifier • Analysis within single domain • Same document set with different ‘perspectives’ • Comparison of different data sets • Information change over time
Analysis within single domain Economic Sector Geographic Region
Comparison of Different Data Sets Two banking web sites analyzed using the same ontology
Monitoring Three ontology classes changing over time
Query Visualization • Query Formulation; Review of Results; Query Refinement
Query Navigation • Visualization is not primary interface • Serves as a global map • Select ontology classes • Documents displayed in text list
Existing IViz Techniques • Hyperbolic Tree • ‘The Brain” • Self-Organizing Maps (SOMs)
Hyperbolic Tree (Source: http://www.inxight.com)
The Brain Source: http://www.thebrain.com
Kohonen SOM Source: http://websom.hut.fi/websom/
Cluster Map - Class Positioning • Spring Embedder algorithm • Nodes attract • Edges repel • …until a stable state is attained • Semantic Closeness • Two classes are close when they share many instances • Two instances are close when they belong to the same class
Cluster Map Advantages • All classes and class instances are displayed at one time • Non-tree like hierarchies can be displayed (not just graph structures) • Overlap between classes is exploited • Good for categorizing IR query results using light-weight ontology
Cluster Map Weakness • Light-weight ontologies • Number of classes small as compared to number of class instances • Some classes will be densely populated • Increasing specialization will help • Scaling to large number of instances • Doesn’t show document similarity • Can only view by class membership
Displaying Document Similarity • Document analysis is subordinate to navigation and querying • Can show document list with ranking • Seeling Proposal: • Document Map Visualization
Seeling Visualization • Basic idea: • Select ontology class • See all documents against the document space
Document Similarity – Volvox • Extend Cluster Map • Replace document containers with volvox containers • Retains global display • No separate “document space” display • Another benefit - unlimited nesting – allows drilldown • Named by Dr. McCain / Henry Small after similarly-shaped microorganism Source: www.groxis.com
Non-class membership Views • View data by combining classes • Use information (properties, sub-classes) that relate classes to one another • Example: • Data about people, projects, and organization-produced papers • Visualize people and papers together to show their interaction
Summary • Ontologies useful in categorizing IR search results • Cluster Map visualizes small document spaces effectively • Can be adapted to handle larger document spaces • Alternate views, complex ontologies will require other visualization methods • More research is needed to support the use of ontologies in visualization
Information Retrieval on the Semantic Web Using Ontology-based Visualization • Questions