330 likes | 457 Views
The Gene Ontology Project:. Developing and Using Controlled Vocabularies for Sharing Biological Information. GO Project Goals. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary terms (annotation) Develop tools:
E N D
The Gene Ontology Project: Developing and Using Controlled Vocabularies for Sharing Biological Information
GO Project Goals • Compile structured vocabularies describing • aspects of molecular biology • Describe gene products using vocabulary terms • (annotation) • Develop tools: • to query and modify the vocabularies and • annotations • annotation tools for curators
GO Data • GO provides two bodies of data: • Terms with definitions and cross- • references • Gene product annotations with • supporting data
The Three Ontologies • Molecular Function — elemental activity or task • nuclease, DNA binding, transcription factor • Biological Process — broad objective or goal • mitosis, signal transduction, metabolism • Cellular Component — location or complex • nucleus, ribosome, origin recognition complex
Parent-Child Relationships A child is a subset of its parent’s elements
DAG Structure Directed acyclic graph: each child may have one or more parents
Relationship Types • is-a • subclass; a is a type of b • part-of • physical part of (component) • subprocess of (process)
The True Path Rule Every path from a node back to the root must be biologically accurate
GO Annotation • Association between gene product and • applicable GO terms • Provided by member databases • Made by manual or automated methods
DAG Structure Annotate to any level within DAG
DAG Structure mitotic chromosome condensation S.c. BRN1, D.m. barren Annotate to any level within DAG
DAG Structure mitosis S.c. NNF1 mitotic chromosome condensation S.c. BRN1, D.m. barren Annotate to any level within DAG
GO Annotation: Data • Database object: gene or gene product • GO term ID • Reference • publication or computational method • Evidence supporting annotation
GO Evidence Codes IDA-Inferred from Direct Assay IMP-Inferred from Mutant Phenotype IGI-Inferred from Genetic Interaction IPI-Inferred from Physical Interaction IEP-Inferred from Expression Pattern TAS-Traceable Author Statement NAS-Non-traceable Author Statement IC - Inferred by Curator ISS-Inferred from Sequence or structural Similarity IEA-Inferred from Electronic Annotation ND-Not Determined
GO Evidence Codes From reviews or introductions IDA-Inferred from Direct Assay IMP-Inferred from Mutant Phenotype IGI-Inferred from Genetic Interaction IPI-Inferred from Physical Interaction IEP-Inferred from Expression Pattern TAS-Traceable Author Statement NAS-Non-traceable Author Statement IC - Inferred by Curator ISS-Inferred from Sequence or structural Similarity IEA-Inferred from Electronic Annotation ND-Not Determined automated From primary literature
GO Annotation: Methods • Manual • Automated • sequence similarity • transitive annotation • nomenclature, other text matching
YFP Automated Annotation: InterPro Example InterPro entry GO entry InterPro2go links InterPro entries and GO terms
YFP Automated Annotation: InterPro Example Run InterProScan to link YFP and InterPro entry InterPro entry GO entry InterPro2go links InterPro entries and GO terms
YFP Automated Annotation: InterPro Example Run InterProScan to link YFP and InterPro entry Infer GO term from the other two links InterPro entry GO entry InterPro2go links InterPro entries and GO terms
GO Annotation: Contributors • FlyBase • WormBase • Saccharomyces Genome Database • DictyBase • Mouse Genome Informatics • Gramene • The Arabidopsis Information Resource • Compugen, Inc. • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit (Sanger Institute) • PomBase (Sanger Institute) • Rat Genome Database • The Institute for Genomic Research
GO Annotation: Organisms • Fruit fly (Drosophila melanogaster) • Budding yeast (Saccharomyces cerevisiae) • Fission yeast (Schizosaccharomycespombe) • Human (Homosapiens) • Mouse (Mus musculus) • Rice (Oryza sativa) • Rat (Rattusnorvegicus) • Tsetse fly (G. morsitans) • Caenorhabditiselegans • Arabidopsis thaliana • Vibrio cholerae • Dictyostelium discoideum
GO Data Formats • flat files • working version; updated daily • archived monthly • XML RDF • updated monthly • MySQL database • updated monthly
GO Tools • Database (and schema) • Perl API • Browser: AmiGO • Editing tool: DAG-Edit
detailed view of term AmiGO Browser
gene products annotated to term AmiGO Browser
DAG view tree view editing DAG-Edit
What GO is NOT: • Not a way to unify biological databases • Not a dictated standard • Does not define evolutionary relationships • Additional ontologies needed to model • biology and experimentation
Terms outside the Scope of GO • Names of gene products • Protein domains • Protein sequence features • Phenotypes; diseases • Anatomical terms (except as part of terms generated by cross-products)
The GOBO Proposal • Global Open Biology Ontologies • Umbrella site for shared genomics and • proteomics vocabularies • Present incarnation: subdirectory within • GO repository: • www.geneontology.org/doc/gobo.html
GOBO Criteria • Open source • Can be instantiated in DAML+OIL • or GO syntax • Orthogonal • Shared ID space • Defined terms
Some GOBO Ontologies gene gene_attribute gene_structure SO gene_variation ME gene_product gene_product_attribute molecular_function GO protein_family INTERPRO phenotype mutant phenotype anatomy For complete current draft see www.geneontology.org/doc/gobo.html
www.geneontology.org • FlyBase & Berkeley Drosophila Genome Project • WormBase • Saccharomyces Genome Database • DictyBase • Mouse Genome Informatics • Gramene • The Arabidopsis Information Resource • Compugen, Inc. • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit (Sanger Institute) • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research The Gene Ontology Consortium is supported by NHGRI grant HG02273 (R01). The Gene Ontology project thanks AstraZeneca for financial support. The Stanford group acknowledges a gift from Incyte Genomics.