620 likes | 772 Views
The OBO Foundry A Gold Standard Approach to Ontology Evaluation. Barry Smith http://ontology.buffalo.edu/smith. Two types of ontology. natural-science ontologies capture terminology-level knowledge underlying the best current science
E N D
The OBO FoundryA Gold Standard Approach to Ontology Evaluation • Barry Smith • http://ontology.buffalo.edu/smith
Two types of ontology • natural-science ontologies capture terminology-level knowledge underlying the best current science • contrasted with administrative ontologies (e.g. billing ontologies, bloodbank ontologies, lab workflow ontologies) prepared for specific, local purposes
scientific ontologies have special features • Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence • scientific ontologies are realism-based
For scientific ontologies • reusability is crucial • compatibility with neighboring scientific ontologies • it is generalizations that are important • = universals, types, kinds
An ontology is a representation of universals • We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories • experiments relate to what is particular science describes what is general
what is the difference between an ontology and a scientific theory? • an ontology is also a terminological standardization • WHAT DOES THIS MEAN?
1st aspect: additivity • cell = def. plant cell, consisting of protoplast and cell wall; ... [Plant Ontology] • what happens when the users of the Plant Ontology need to consider bacterial pathogens in plants?
2nd aspect: calibration with reality gold standard kilogram the same universal is defined by reference either to some artifact or to some universal physical constant (for realists there is no problem here)
VIM: the InternationalVocabulary of Metrology • (i) repeated measurements always give rise to some variation in values, • (ii) one can never be sure (fallibilism) that one has got the true value, • Hence: • (iii) there are no true values. • To keep happy those who dismiss the notion of the true value, the international community is agreeing to a set of terms which intentionally allow two possible interpretations • once again: bad philosophy leads to bad standards • Compare:http://ontology.buffalo.edu/medo/Wuesteria.pdf
from: The NIST Reference on Constants, Units and Uncertainty • The creation of the decimal Metric System at the time of the French Revolution and the subsequent deposition of two platinum standards representing the meter and the kilogram, on 22 June 1799, in the Archives de la République in Paris can be seen as the first step in the development of the present International System of Units.
from: The NIST Reference on Constants, Units and Uncertainty • In the 1860s Maxwell and Thomson ‘formulated the requirement for a coherent system of units with base units and derived units. • In 1874 the British Association for the Advancement of Science introduced the CGS system, a three-dimensional coherent unit system based on the three mechanical units centimeter, gram and second, using prefixes ranging from micro to mega to express decimal submultiples and multiples. • The following development of physics as an experimental science was largely based on this system.’
Base and Derived Units • Units based on undefined SI dimensions: meter, second, kilogram, ampere, candela, kelvin, mole. • Units based on defined SI dimensions:volume, area, velocity, acceleration, newton, joule, pascal, coulomb, farad, henry, hertz, lumen, lux, ohm, etc. • Dimensions can be multiplied and divided (meters/second).
The SI System of Units • is a qualitative ontology: it captures qualitative dimensions of reality to which quantities can be applied (it captures measurable dimensions of reality) • there is a degree of conventionality in the choice of basic vs. derived units, and in the standard [e.g. the Paris meter] that is used to define the unit in each dimension
but the dimensions themselves exist independently of our conventions • so that an ontology of these dimensions is a true representation of an independently existing reality
Quantities are Universals • Ingvar Johansson: • Many different things can simultaneously have a mass of 5kg (length of 4m, etc.). • Determinate quantities are universals, which means that they have many instances
Units Ontology • developed in conjunction with PATO, the Phenotypic qualities ontology • obo.sourceforge.net/cgi-bin/detail.cgi?quality
fiat subtypes of qualities quality spatial quality length weight temperature is_a … 1mm 1cm 1g 1kg
Representation of measurements quality unit spatial quality mm length weight temperature cm kg is_a measurement_of g
Ingvar Johansson: • (a) no object can possibly at one and the same time take two values of the same quantity dimension • (b) in case of additive quantities, only quantities of the same dimension can be added together to give rise to a sum: no material object can have two masses, and masses can only be added to other masses
Controlled vocabulary • Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules. • These symbols are the same in every language of the world, even though the names of the units themselves vary in spelling according to national conventions.
The SI system of units gives you: • a gold standard controlled vocabulary for the expression of scientific results which makes these results comparable and integratable • my hypotheses can be checked against your data • my measuring equipment can be callibrated against your measuring equipment (because each can be callibrated against the same gold standard) • the SI system of units can serve as a gold standard because it is a true reflection of an independent reality
a system of units is a legend for measurement data heartrate cadence speed power torque
compare: legends for maps compare: legends for maps
Creating a system of units • is not easy; it has to match the way the measurable dimensions are interconnected in reality • it may need to be revised in light of new discoveries about how reality is structured
after Maxwell and Thomson • the subsequent development of physics as an experimental science was largely based on their system of standardized units.
analogous achievements also in chemistry • IUPAC • InChI • and in molecular biology, • for proteins, enzymes, genes, etc. • IUBMB • HUGO Gene Nomenclature Committee, • etc.
the goal of realist ontology • to generalize this achievement • specifically in biology • and in medicine (where forces are at work which tend to thwart standardization of vocabulary) • to move from standardizations of nouns to standardizations of sentences
gene expression data realist ontologies are legends for data
where in the body ? in what kind of cell? what kind of disease process ? need for semantic annotation of data
natural language labels organized in a graph- theoretic structure,designed to make the data cognitively accessible to human beings algorithmically accessible to machines linked up to other data resources because the same labels have been used
compare: legends for cartoons (for diagrams in scientific texts)
ontologies are legends for mathematical equations xi = vector of measurements of gene i k = the state of the gene ( as “on” or “off”) θi = set of parameters of the Gaussian model ... ...
or chemistry diagrams Prasanna,et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006)
annotation using common ontologies yields integration of databases GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem
What is mapping (1) • “Given two ontologies A and B, mapping one ontology with another means that for each concept (node) in ontology A, we try to find a corresponding concept (node), which has the same or similar semantics, in ontology B and vice verse.” M. Ehrig M and Y. Sure, Ontology mapping - an integrated approach. In Proceedings of the First European Semantic Web Symposium, ESWS 2004, volume 3053 of Lecture Notes in Computer Science, pages 76–91, Heraklion, Greece, May 2004. Springer Verlag.
What is mapping (2) • “the task of relating the vocabulary of two ontologies in such a way that the mathematical structure of ontological signatures and their intended interpretations, as specified by the ontological axioms, are respected ”. • [ontological signature = a hierarchy of concept symbols together with a set of relation symbols whose arguments are defined over the concepts of the concept hierarchy] Y. Kalfoglou and M. Schorlemmer, Ontology mapping: the state of the art. Knowl. Eng. Rev., 18(1): 2003.
What is mapping (3) • “a formal expression that states the semantic relation between two entities belonging to different ontologies”, • “Simple examples are: • concept c1 in ontology O1 is equivalent to concept c2 in ontology O2; • concept c1 in ontology O1 is similar to concept c2 in ontology O2; • individual i1 in ontology O1 is the same as individual i2 in ontology O2” P. Bouquet et al. KnowledgeWeb deliverable D2.2.1. Specification of a common framework for characterizing alignment.
One way to support ontology matching (and evaluation) • have experts manually prepare for each given matching problem a gold standard to which matching efforts could be compared. • M. Ehrig and J. Euzenat, Relaxed Precision and Recall for Ontology Matching, in: Proc. K-Cap 2005 workshop on Integrating ontology, Banff (CA), p. 25-32, 2005.
Gold standard methodology for ontology evaluation • is very expensive • who are the experts? • sometimes cannot be done for political reasons • UMLS metathesaurus • even a gold standard can contain errors
Solution: The OBO Foundry • some large pieces already exist (especially Gene Ontology, Foundational Model of Anatomy) • processes of unification and reform already in place • all participants aiming for additivity • procedures for constant update in light of scientific advance • http://obofoundry.org
The GO methodology of annotations • science basis of the GO: trained experts curating peer-reviewed literature • RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form • Contrast: data-mining based approaches to ontology construction
Systematic annotation of references to gene products in literature • leads to improvements and extensions of the ontology • leads to better annotations • leads to a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself
Five bangs for your GO buck • science base • cross-species database integration • cross-granularity database integration • through links to the entities in biological reality • semantic searchability links people to software
First step (2003) • a shared portal for (so far) 58 ontologies • (low regimentation) • http://obo.sourceforge.net NCBO BioPortal
id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Second step (2004)reform efforts initiated, e.g. linking GO to other OBO ontologies to ensure orthogonality GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition