610 likes | 735 Views
GO, NCBO, Phenotypes, & the OBO Foundry January 29 th , 2007 Ontologies for Biomedical Investigations La Jolla Institute for Allergy and Immunology. Suzanna Lewis GO Consortium & National Center for Biomedical Ontology http://www.geneontology.org/ http://www.bioontology.org/. Outline.
E N D
GO, NCBO, Phenotypes, & the OBO FoundryJanuary 29th, 2007Ontologies for Biomedical InvestigationsLa Jolla Institute for Allergy and Immunology Suzanna Lewis GO Consortium & National Center for Biomedical Ontology http://www.geneontology.org/ http://www.bioontology.org/
Outline • Perspective on the challenge of our mutual investment of time and effort on standards, formalisms, and representation • GO case study retrospective • NCBO today • Phenotype case study • OBO-Foundry
The Scientific Method • A body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and integrating previous knowledge. It is based on observable, empirical, measurable evidence, and subject to rules of reasoning. • Isaac Newton (1687, 1713, 1726). "Rules for the study of natural philosophy", Philosophiae Naturalis Principia Mathematica, Book 3, The System of the World. Third edition, the 4 rules as reprinted on pages 794-796 of I. Bernard Cohen and Anne Whitman's 1999 translation, University of California Press ISBN 0-520-08817-4, 974 pages.
Today’s data is in electronic form • Rules of reasoning are an intrinsic element of scientific investigation • In our current era, data reside in electronic form • Building and using computable ontologies will support rules of reasoning on our data • And thereby support research in the computer age.
Necessary Character of a computational environment for biological research • Sustainable • There must be mechanism for maintaining the environment (that is less than the initial cost). • Adaptable • It must work for the complete spectrum of data types, from genomics to clinical trials • It must continually adapt to new knowledge and new technologies • Interoperable • We need the capability of easily integrating data from a variety of sources. • Evolvable • Mechanisms must be put in place to respond to the needs of the biomedical research community. They provide the primary selection pressure on the evolution of the technology.
Clarity of Vision/Goal Political Landscape needs to support the goal Decision-Process needs to be responsive and efficient Message has to be brought to the community Accountability (i.e. no vaporware) Feasible within available resources Providing incentives for adoption Tactics must change with the adoption curve Sustaining effort over multiple years Being satisfied with highly imperfect, but pragmatic solutions. 10 Factors in achieving goals Credit to John Glaser
Criteria for success, & signs of failure • Measurable evidence of improved productivity and efficiency (time saved vs. number of users for given output) • Evidence of learning from experience • e.g., sustained improvements in content volume and quality) • Evidence of discoverability - enables positive outputs that were unanticipated • Continual, iterative process that occurs at each stage of development and growth, fully integrated into the lifecycle • Evidence of community acceptance, that more data is being provided continuously • Contains sufficient information to supports reproducibility of results • Negotiation of meaning has occurred • Stymied by barriers regarding IP, and credit attribution • High relative cost for maintenance, support, and boosting interest • Problems that occur between the cracks • The criteria for success listed above are not being met
GO case study "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled.” Richard Feynman (1962)
Three fundamental dichotomies • types vs. instances • continuants vs. occurrents • dependent vs. independent
occurent dependent continuant independent For example, in the GO’s 3 ontologies molecular function biological process cellular component Molecules, cell components , organisms are independent continuants which have functions (these are dependent continuants), and these functions may be realized as an occurent process when “functioning”
Specific Aims of the GO 2006 • We will maintain comprehensive, logically rigorous and biologically accurate ontologies. • We will comprehensively annotate 9 reference genomes in as complete detail as possible. • We will support annotation across all organisms. • We will provide our annotations and tools to the research community.
Weaving and untangling the GO • Missing relations • is_a completeness • Adding new relations within single GO ontology • Adding “regulates” to BP • Distinguishing different part_of relations • Adding Relations between GO axis • Linking between MF & BP & CC • Adding relations between GO & other ontologies • GO+Cell • GO+anatomy • GO+ChEBI
Implicit ontologies within the GO: • cysteine biosynthesis (ChEBI) • myoblast fusion (Cell Type Ontology) • hydrogen ion transporter activity (ChEBI) • snoRNA catabolism (Sequence Ontology) • wing disc pattern formation (Drosophila anatomy) • epidermal cell differentiation (Cell Type Ontology) • regulation of flower development (Plant anatomy) • interleukin-18 receptor complex (not yet in OBO) • B-cell differentiation (Cell Type Ontology)
Relations to Other Ontologies CL GO blood cell cell differentiation lymphocyte differentiation lymphocyte B-cell activation B-cell is_a B-cell differentiation
CELL Ontology [Term] id: CL:0000236 name: B-cell is_a: CL:0000542 ! lymphocyte develops_from: CL:0000231 ! B-lymphoblast Augmented GO [Term] id: GO:0030183 name: B-cell differentiation is_a: GO:0042113 ! B-cell activation is_a: GO:0030098 ! lymphocyte differentiation intersection_of: is_a GO:0030154 ! cell differentiation intersection_of: has_participant CL:0000236 ! B-cell
Correlation of mRNA decay rates with (GO) function Genome Research 13:1863-1872, 2003 Decay Rates of Human mRNAs: Correlation With Functional Characteristics and Sequence Attributes. E. Yang, E. van Nimwegen, M. Zavolan, N. Rajewsky, M. Schroeder, M Magnasco and JE Darnell, Jr
How GO measures up • Measurable evidence of improved productivity and efficiency • Researchers simply use the GO, as judged by publications • Evidence of learning from experience • Formalism of the GO continues to improve • Evidence of discoverability - enables positive outputs that were unanticipated • Primary use of GO is cluster analysis of microarray expression data • Continual, iterative process that occurs at each stage of development and growth, fully integrated into the lifecycle • Quarterly updates of software • Evidence of community acceptance, that more data is being provided continuously • Number of species continues to increase
The National Center for Biomedical Ontology BioPortal Phenotype Annotation
NCBO’s 7 Cores • Core 1: Computer science • Core 2: Bioinformatics • Core 3: Driving biological projects • Core 4: Infrastructure • Core 5: Education and Training • Core 6: Dissemination • Core 7: Administration
Who NCBO is • Stanford: Tools for ontology alignment, indexing, and management (Cores 1, 4–7: Mark Musen) • Lawrence–Berkeley Labs: Tools to use ontologies for data annotation (Cores 2, 5–7: Suzanna Lewis) • Mayo Clinic: Tools for access to large controlled terminologies (Core 1: Chris Chute) • Victoria: Tools for ontology and data visualization (Cores 1 and 2: Margaret-Anne Story) • University at Buffalo: Dissemination of best practices for ontology engineering (Core 6: Barry Smith)
NCBO Driving Biological Projects • Trial Bank: UCSF, Ida Sim • Flybase: Cambridge, Michael Ashburner • ZFIN: Oregon, Monte Westerfield
BioPortal • Indexes, searches and visualizes terms in ontologies in library • Uses LexGrid (Mayo) • Contains ontologies that their editors have released to BioPortal
The BioPortal Needs You! • We need, and beg and plead for, your feedback • http://www.bioontology.org/ncbo/faces/index.xhtml • For example: Providing URIs for all ontologies and/or ontology content? • Tomorrow depends on you, no request is too mundane.
Animal disease models Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype
Animal disease models Humans Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease) Mutant Gene Mutant or missing ProteinMutant Phenotype (disease model)
Animal disease models Humans Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease) Mutant Gene Mutant or missing ProteinMutant Phenotype (disease model)
Animal disease models Humans Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease) Mutant Gene Mutant or missing ProteinMutant Phenotype (disease model)
SHH-/+ SHH-/- shh-/+ shh-/-
Phenotype (clinical sign) = entity + quality
Phenotype (clinical sign) = entity + quality P1 = eye + hypoteloric
Phenotype (clinical sign) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic
Phenotype (clinical sign) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied
Phenotype (clinical sign) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +
Phenotype (clinical sign) = entity + quality Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process cellular component + PATO (phenotype and trait ontology)
Phenotype (clinical sign) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied Syndrome = P1 + P2 + P3 (disease) = holoprosencephaly
Human holo- prosencephaly Zebrafish shh Zebrafish oep
PaTO upper level • Unifying goal: Integrating data • within and across domains (e.g. different taxa) • across levels of granularity • across different perspectives • Requires • Rigorous formal definitions in both ontologies and annotation schemas
Top level PaTO division:spatial vs temporal Note: some nodes omitted for brevity Quality Quality of a continuant A quality which inheres In a continuant Quality of an occurrent A quality which inheres In a process or spatiotemporal region physical quality cellular quality morphology duration color density shape size structure arrested premature delayed
Top level PaTO division: Granularity Monadic quality of a continuant … Physical quality A quality that exists through action of continuants at the physical level of organisation Cellular quality A quality that exists at the cellular level of organisation … nucleate quality ploidy potency color temperature mass green diploid multipotent large mass pink hot haploid totipotent anucleate small mass cold yellow aneuploid oligoptent binculeate
Monadic vs. relational quality of a continuant … Monadic quality of a C A quality of a C that inheres solely in the bearer and does not require another entity Relational quality of a C A quality of a C that requires another entity apart from its bearer to exist … Sensitivity (to) Displacement (with) Connected-ness (to) Physical quality Cellular quality morphology shape size structure
Relational qualities involving the environment • “drought sensitivity” [TO:0000029] • Directed towards an additional entity type • Q= PATO:sensitivity E2= EO:drought Def: asensitivitywhichis directedtowardsdrought [ inheres_inorganism ] OBO needs a good environment ontology
What is Phenote? • A tool for annotating Phenotypes • Curator reads about a phenotype in the literature related to taxonomy or genotype • Curator enters genotype(or taxonomy) • Curator enters genetic context (optional) • Curator searches/enters Entity (e.g. Anatomy) • Curator searches/enters PATO attribute/value
ZFIN integration Also Phenote
Anatomy Cell Chemical Drug Disease Environmental context . . . Qualifier Unit GO - biological process GO - molecular function GO - cellular component Other ontologies…