100 likes | 114 Views
Explore the importance of ontologies in overcoming terminology barriers for data discovery and sharing in science domains. Learn how to create, integrate, and leverage ontologies for cross-domain information retrieval.
E N D
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane
Statement of Problem • Each science domain or community develops its own terminology to describe concepts, resources (objects, data) and relationships • Data discovery and data sharing depend critically on being able to attach unambiguous meaning to the terms used to describe domain knowledge • Generalized metadata standards such as ISO 19115 lack domain-specific elements
Knowledge Organization Systems • Controlled vocabularies • Glossaries & Dictionaries • Thesauri (limited ability to express relationships between terms) • Gazetteers (place names, sometimes classified and categorized) • Classification Schemes (taxonomies) • Ontologies (can create complex model of reality including rules and axioms)
Ontologies • Expressed in a formal conceptual language (UML, ERD, RDF, OWL,...) where symbols, text and rules of grammar are used to express • classes (conceptualizations of objects) • instances of classes • properties of classes • relationships between classes
Why Create an Ontology? • To transfer domain knowledge to scientists and educators outside the domain • To enable reuse of domain knowledge • Ability integrate existing ontologies • To make domain assumptions explicit • Easier to modify models of reality as domain knowledge evolves • To enable domain-independent services, inductive reasoning and natural-language processing • Allow transmission of knowledge across languages • Translation not always one-to-one
The Big Question • Who is going to do it? • DIS subcommittee of domain experts • How will it be funded? • Corporate underwriting? • Prototype
Process • There is no one correct way to model a domain— there are always viable alternatives. • Ontology development is necessarily an iterative process.
Approaches – Top Down • Begin with survey of existing domain knowledge representations in each IPY discipline • Reuse • Many to choose from • Investigate tools for bringing these knowledge bases into a common system
Approaches – Bottom Up • System for assigning subject metadata (tagging) • High level terms from defined domain specification • Leave discovery and simple semantic relationships to web services such as Google • Mechanism for distilling subject metadata once assigned • Once data released, users should be able to assign new tags • Community review and editing (wiki?)
Outcome • Legacy of IPY will be a dynamic system for cross-domain information discovery and retrieval • Community based • Language neutral