1 / 21

Building and Using Ontologies

This article discusses the importance of knowledge and metadata in bioinformatics, the need for a shared understanding, and the process of building and using ontologies. It also explains the concepts of knowledge, metadata, syntax and semantics, and the Gene Ontology.

pressleym
Download Presentation

Building and Using Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building and Using Ontologies Dr. Robert Stevens Department of Computer Science University of Manchester Robert.stevens@cs.man.ac.uk

  2. Introduction • Knowledge & metadata • The nature of bioinformatics resources • A shared understanding • Terminologies and ontologies • Building an ontology • Using an ontology

  3. Name Job Institution Country C o n f Michael Ashburner Professor University of Cambridge UK I S M B What is Knowledge? man academic, senior ancient university, 5 rated European important figure in biology B I O L O G Y • Knowledge – all information and an understanding to carry out tasks and to infer new information • Information -- data equipped with meaning • Data -- un-interpreted signals that reach our senses

  4. What is Metadata? • Metadata is data about data (information about information) • A schema is a DBs metadata; as is the administrator's name; the creator, date of creation, documentation • The label on an Ependorf tube in a freezer is metadata • A DBs entry’s annotation is metadata on the sequence data

  5. Syntax & Semantics • Infix 2 + 3 = 5 • Prefix = + 2 3 5 • Postfix 2 3 + 5 = • Binary 010 + 011 = 101 • Roman II + III = V • 7+ 3 = 42

  6. Types of Semantics • An operational semantics for a language is defined by what a sentence in that language will do. • Denotational semantics is a precise mathematical definition of the objects and relations of language in which each sentence of the language names, or denotes, a mathematical object, such as a function. • Naturalsemantics are the loose ordinary language sense, in which the semantics of a statement is its "meaning". • The term logisticsemantics refers to formal models that attempt to represent the natural semantics of some external domain.

  7. Knowledge in Bioinformatics

  8. A Shared Understanding • Synonyms and homonyms are rife • Need to know that terms in one resource mean the same in another resource • Means comparisons are much easier: Can ask questions over many resources • Structure enables discovery and query abstractions • Useful for both humans and computers • The Gene Ontology allows queries outside one model organism

  9. London Bills of Mortality

  10. Aggregated Stats

  11. Nucleic acid Ribosome RNA DNA rRNA tRNA What is an Ontology? • A means of capturing knowledge in a computationally amenable form • A shared understanding for humans and computers • A set of vocabulary terms that represents a community’s understanding of a domain • A set of definitions for those terms • The relations between those terms • A formal semantics • A conceptual model whose labels provide a vocabulary

  12. The art of ranking things in genera and species is of no small importance and very much assists our judgment aswell as our memory. You know how much it matters in botany, not to mention animals and other substances, oragain moral and notional entities as some call them. Order largely depends on it, and many good authors write insuch a way that their whole account could be divided and subdivided according to a procedure related to generaand species. This helps one not merely to retain things, but also to find them. And those who have laid out all sortsof notions under certain headings or categories have done something very useful. Gottfried Wilhelm Leibniz, New Essays on Human Understanding

  13. Nucleic acid Ribosome RNA DNA rRNA tRNA Components of an Ontology: Concepts • Concepts: A unit of thought • AKA: Class, Set, Type, Predicate • Gene, Reaction, Macromolecule • Terms are labels of concepts • Taxonomy of concepts • Generalization ordering among concepts • Concept A is a parent of concept B iff every instance of B is also an instance of A • Superset / subset • “A kind of” vs. “a part of”

  14. Components of an Ontology: Relations • Relations and Attributes • AKA: Slots, properties, roles • Product of Gene, Map-Position of Gene • Reactants of Reaction, Keq of Reaction • Meta information about relations • Cardinality, optionality, type restrictions on filler • Transitive, symmetric, functional role properties • Role hierarchies Slot: Expresses Range: Polypeptide or RNA Domain: Genes Cardinality: At-least-1 • General Axioms (constraints) • Nucleic acids < 20 residues are oligonucleiotides

  15. Gene Ontology http://www.geneontology.org • “a dynamic controlled vocabulary that can be applied to all eukaryotes” • Built by the community for the community. • Three organising principles: • Molecular function, Biological process, Cellular component • Is-a and Part of taxonomy • ~15,000 concepts

  16. Components of an Ontology: Instances • Instances • AKA: objects, individuals, set members • trpA Gene, Reaction 1.1.2.4, Death-receptor-3 • Strictly speaking, an ontology with instances is a knowledge base • The distinction between an instance and a concept is difficult. • Lard-binding-proteins are all those that bind Death-receptor-3.

  17. Components of an Ontology: Properties • Primitive: properties are necessary • Globular protein must have hydrophobic core, but a protein with a hydrophobic core need not be a globular protein • Defined: properties are necessary + sufficient • Eukaryotic cells must have a nucleus. Every cell that contains a nucleus must be Eukaryotic.

  18. An Ontology Building Life-cycle Identify purpose and scope Consistency Checking Knowledge acquisition Building Language and representation Conceptualisation Integrating existing ontologies Available development tools Encoding Ontology Learning Evaluation

  19. How to do it • Collect terms: MacroMolecule, Protein, Enzyme, Holoprotein, Holoenzyme. • Arrange into a Polyhierarchy (by hand) • Write a definition for each term • Encode in some representation • Carry on • Test against scope, requirements and competency questions

  20. How to do it • Enzyme: is-a MacroMolecule • polymerOf AminoAcid • Catalyses Reaction • HoloEnzyme: is-a MacroMolecule • polymerOf AminoAcid • binds ProstheticGroup • Catalyses Reaction • HoloProtein: is-a MacroMolecule • polymerOf AminoAcid • binds ProstheticGroup • Protein: is-a MacroMolecule • polymer of AminoAcid

  21. Tips for Building your Terminology • Choose a narrow ,but useful area • Build using domain experts • Regard computer scientists as a service • You’ll never be complete or correct: Publish early • Be practical: Truth and beauty is a bonus • Be open • A large commitment and a never ending process • Start simple and migrate to expressivity and “correctness” as you develop • OWL can do this migratory path

More Related