290 likes | 422 Views
Dictionaries, Vocabularies, Namespaces, Thesauri, Ontologies, and all that. Rob Raskin NASA/Jet Propulsion Laboratory Raskin@jpl.nasa.gov June 21, 2011. Why care about data semantics?. Current data may need to be archived for decades or centuries
E N D
Dictionaries, Vocabularies, Namespaces, Thesauri, Ontologies, and all that Rob Raskin NASA/Jet Propulsion Laboratory Raskin@jpl.nasa.gov June 21, 2011
Why care about data semantics? • Current data may need to be archived for decades or centuries • Global change analysis requires consistent comparisons across decades or centuries • Synonyms • multiple words, same meaning • Homonyms • same word, multiple meanings • Measurement ambiguities • Sea “surface” temperature - at what “height”?
Semantic Understanding is Difficult! Time flies like an arrow. Fruit flies like a pie. Let’s eat, Grandma. Let’s eat Grandma. “Mission accomplished. Major combat operations in Iraq have ended” -Pres. Bush, 2003 Variable t: temperature Variable t: time Data quality= 5 Data quality= 3 LA Times headline Surface wind: measured 3 m above surface Surface wind: measured at surface
Semantic Spectrum Semantics Formal Hierarchy Terms inherent properties/ meaning of parent Formal Hierarchy w/ Relations Relations between children defined Informal Hierarchy Terms classified by categories (e.g. GCMD) Catalog List of controlled words VocabularyOntology Human-Readable Machine-Readable
Scope of Representation Parameter names Scientific units Spatial/temporal extent/resolution Data quality Data provenance Data type Data services CF
What is an Ontology? An approach to store knowledge Machine-readable and human-readable Provides definition of words or phrases expressed relative to other terms Offers shared understanding of concepts and knowledge reuse Provides semantics for machine-to-machine (or human-to-human) communications
Practically, an ontology is a… • Framework for classifying knowledge • Ensures there is a “place” to store components of knowledge
Ontology Languages:RDF and OWL • W3C has adopted languages that specialize XML • Resource Description Formulation (RDF) • Ontology Web Language (OWL) • Languages predefine specific tags • RDF: Class, subclass, property, subproperty • Class-property similar to Entity-Relation of DBMS theory
RDF Class and Subclass • Class • The basic element or “thing” or “noun” • Subclass • Inherits all attributes of parent class • Typically, adds Properties to distinguish subclass from its parent • Can have multiple parent classes is a Cat Animal has Legs 4
RDF Property & Subproperty • Property • A “verb” • Examples: • measures, hasLocation, hasArea, northOf • Properties can have attributes: • domain, range, transitive, … • Subproperty • Inherits parent attributes
OWL Language • Extends RDF to predefine further tags • cardinality • transitive relations • inverse relations • same as, different from • union, intersection • domain, range • Import (from one ontology to another, to enable sharing and reuse of the work of others) • …
OWL Ontology Example <Class “WaterPollution> <SubClassOf “Pollution”> <Restriction> <OnProperty “hasSubstance”> <AllValuesFrom “Water”> </Restriction> </SubClassOf> </Class>
Statements about Statements • OWL allows us to make statements about statements • Degree of belief • Timestamps • Provenance / Lineage • Probability / Uncertainty • Security issues • Author / Source / Community • Community dialect • … Corn Crop is a Observed Feature Landsat has Source 0.75 has Probability
Why are Ontologies Useful? (1) • Ontologies provide a common namespace • Documents, web pages, data, people, and other resources can be mapped/ categorized to this namespace • Anybody can create or extend the namespace
Why are Ontologies Useful? (2) • Dictionary • Concepts in the namespace not just “listed” (a taxonomy), but “defined” (in terms of others) • Concepts defined via specializations of broader concepts -- with properties that distinguish each child from the broader parent concept • Reductionist approach of science • Arbitrary levels of specialization are possible • As with Library of Congress and Dewey Decimal numbering systems
Why are Ontologies Useful? (3) • Disambiguation • Reduces semantic mismatch • Synonym support (multiple terms with same meaning) • label available to indicate preferred term for each community • Homonym support (multiple meanings of same term) • separate namespaces (President:Bush vs Plant:Bush)
Why are Ontologies Useful? (4) • Machine readable • Ontologies are generally stored in a format (XML) that is readable by both humans and computers • Computer accessibility enables automated reasoning
Why are Ontologies Useful? (5) • Knowledge retention • Corporations use knowledge management to ensure institutional memory over time, as personnel come and go • Climate disciplines can do the same! • Facts/data can be represented and related in a consistent manner • Common sense knowledge is captured • Instrument characteristics
Ontology Representation (1):Knowledge Base of Triples Noun-Verb-Noun representation Parent-child relations: • Flood is a Weather Phenomena • GeoTIFF is a File Format • Soil Type is a Physical Property • Pacific Ocean is a Ocean Or create your own relations: • Ocean has substance Water • Sensor measures Temperature
Ontology Representation (3): XML, RDF, and OWL • W3C has adopted XML-based standard ontology languages • Resource Description Formulation (RDF) • Ontology Web Language (OWL) • Languages predefine specific tags • RDF: Class, subclass, property, subproperty, … • OWL: Extends RDF to predefine further tags such as cardinality • Three flavors of OWL (Lite, DL, and Full) • Use of standard languages makes it easy to extend (specialize) work of others
Global Warming Query in the Semantic Web Find data which demonstrates global warming at high latitudes during summertime and plot warming rate. Extract information from the use-case - encode knowledge Translate this into a complete query for data - inference and integration of data from instruments, indices and models “Global warming”= Trend of increasing temperature over large spatial scales “High latitude”= |Latitude| > 60 degrees “Summertime”= June-Aug (NH) and Jan-Mar (SH) “Find data”= Locate datasets using catalogs, then access and read it “Plot warming rate”= Display temperature vs time
Semantic Web for Earth and Environmental Terminology (SWEET) • Concept space written in OWL • Initial focus to assist search for data resources • Funded by NASA • Later focus to serve as community standard (upper-level Earth system science ontology) • Enables scalableclassification of Earth system science and associated data concepts • Specialists can further refine SWEET concepts • SWEET 2.2 has 6600 concepts in 200 modular ontologies • http://sweet.jpl.nasa.gov
CF vs SWEET Representation CF (traditional single-attribute parameter name): tendency_of_mole_concentration_of_dissolved_ inorganic_phosphorus_in_sea_water_due_to_ biological_processes SWEET (multi-attribute parameter name): Quantity= mole_concentration Transformation= tendency State= dissolved, inorganic Substance= phosphorous Medium= sea_water Process= biological_processes
SWEET Data Ontology • Dataset characteristics • Format, data model, dimensions, … • Provenance • Source, processing history, … • Parameters • Scale factors, offsets, … • Data services • Subsetting, reprojection, … • Quality measures • Special values • Missing, land, sea, ice, ...
Best Practices • Keep ontologies small, modular • Use higher level ontologies where possible • Identify hierarchy of concept spaces • Try to keep dependencies unidirectional • Gain community buy-in • Involve respected leaders
Ontology Development Tools: CMAP • Free, downloadable tool for knowledge representation and ontology development • Visual language with input/export to OWL • Supports subset of OWL language • http://cmap.ihmc.us/coe
Resources • ESIP Semantic Web Cluster • Monthly telecons • Tutorials • Ontology development • Datatypes • data services • SWEET • http://sweet.jpl.nasa.gov • Rob Raskin raskin@jpl.nasa.gov