1 / 76

EECS 800 Research Seminar Mining Biological Data

This research seminar explores gene ontology, its construction, and its use in mining biological data. Topics include text mining, natural language processing, and information extraction. The seminar also covers ontology and its applications in bioinformatics.

aalbaugh
Download Presentation

EECS 800 Research Seminar Mining Biological Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 800 Research SeminarMining Biological Data Instructor: Luke Huan Fall, 2006

  2. Administrative • Class presentation schedule is online • First class presentation is “kernel based classification” by Han Bin on Nov 6th • Project design is due Oct 30th

  3. Overview • Gene ontology • Challenges • What is gene ontology • construct gene ontology • Text mining, natural language processing and information extraction: An Introduction • Summary

  4. Ontology • <philosophy> A systematic account of Existence. • <artificial intelligence> (From philosophy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them. • <information science> The hierarchical structuring of knowledge about things by subcategorising them according to their essential (or at least relevant and/or cognitive) qualities. This is an extension of the previous senses of "ontology" (above) which has become common in discussions about the difficulty of maintaining subject indices. The philosophy of indexing everything in existence?

  5. Aristotele’s (384-322 BC) Ontology • Substance • plants, animals, ... • Quality • Quantity • Relation • Where • When • Position • Having • Action • Passion

  6. Ontology and -informatics • In information sciences, ontology is better defined as: “a domain of knowledge, represented by facts and their logical connections, that can be understood by a computer”. (J. Bard, BioEssays, 2003) • “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing” (Gruber, 1993)

  7. Information Exchange in Bio-sciences • Basic challenges: • Definition, definition, definition • What is a name? • What is a function?

  8. Cell

  9. Cell

  10. Cell

  11. Cell

  12. Cell Image from http://microscopy.fsu.edu

  13. What’s in a name? • The same name can be used to describe different concepts

  14. What’s in a name? • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose from simpler components

  15. What’s in a name? • The same name can be used to describe different concepts • A concept can be described using different names  Comparison is difficult – in particular across species or across databases

  16. What is Function? The Hammer Example Function (what)Process (why) Drive nail (into wood)Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’s juggling object Entertainment

  17. Information Explosion

  18. Entering the Genome Sequencing Era Eukaryotic Genome SequencesYear Genome # Genes Size (Mb) Yeast (S. cerevisiae) 1996 12 6,000 Worm (C. elegans) 1998 97 19,100 Fly (D. melanogaster) 2000 120 13,600 Plant (A. thaliana) 2001 125 25,500 Human (H. sapiens, 1st Draft) 2001 ~3000 ~35,000

  19. What is the Gene Ontology? A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!

  20. http://www.geneontology.org/

  21. What is the Gene Ontology? • Gene annotation system • Controlled vocabulary that can be applied to all organisms • Organism independent • Used to describe gene products • proteins and RNA - in any organism

  22. The 3 Gene Ontologies • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective • broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

  23. Cellular Component • where a gene product acts

  24. Cellular Component

  25. Cellular Component

  26. Cellular Component • Enzyme complexes in the component ontology refer to places, not activities.

  27. Molecular Function insulin binding insulin receptor activity

  28. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity

  29. Molecular Function • A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product. • Sets of functions make up a biological process.

  30. cell division Biological Process a commonly recognized series of events

  31. Biological Process transcription

  32. Biological Process Metabolism: degradation or synthesis of biomelecules

  33. Biological Process Development: how a group of cell become a tissue

  34. Biological Process courtship behavior

  35. Ontology applications • Can be used to: • Formalise the representation of biological knowledge • Standardise database submissions • Provide unified access to information through ontology-based querying of databases, both human and computational • Improve management and integration of data within databases. • Facilitate data mining

  36. Gene Ontology Structure • Ontologies can be represented as directed acyclic graphs (DAG), where the nodes are connected by edges • Nodes = terms in biology • Edges = relationships between the terms • is-a • part-of

  37. Parent-Child Relationships Chromosome Cytoplasmic chromosome Mitochondrial chromosome Nuclear chromosome Plastid chromosome A child is a subset or instances of a parent’s elements

  38. Parent-Child Relationships cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of

  39. Annotation in GO • A gene product is usually a protein but can be a functional RNA • An annotation is a piece of information associated with a gene product • A GO annotation is a Gene Ontology term associated with a gene product

  40. Terms, Definitions, IDs • Term: MAPKKK cascade (mating sensu Saccharomyces) • Goid: GO:0007244 • Definition: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces. • Evidence code: how annotation is done • Definition_reference: PMID:9561267

  41. PMID: 11956323 nek2 Reference Gene Product IDA Inferred from Direct Assay Evidence Code Annotation Example centrosome GO:0005813 GO Term

  42. GO Annotation

  43. GO Annotation

  44. GO Annotation

  45. Evidence Code • Indicate the type of evidence in the cited source that supports the association between the gene product and the GO term http://www.geneontology.org/GO.evidence.html

  46. Types of evidence codes • Types of evidence code • Experimental codes - IDA, IMP, IGI, IPI, IEP • Computational codes - ISS, IEA, RCA, IGC • Author statement - TAS, NAS • Other codes - IC, ND • Two types of annotation • Manual Annotation • Electronic Annotation

  47. IDA: Inferred from Direct Assay • direct assay for the function, process, or component indicated by the GO term • Enzyme assays • In vitro reconstitution (e.g. transcription) • Immunofluorescence (for cellular component) • Cell fractionation (for cellular component)

  48. IMP: Inferred from Mutant Phenotype • variations or changes such as mutations or abnormal levels of a single gene product • Gene/protein mutation • Deletion mutant • RNAi experiments • Specific protein inhibitors • Allelic variation

  49. IGI: Inferred from Genetic Interaction • Any combination of alterations in the sequence or expression of more than one gene or gene product • Traditional genetic screens • - Suppressors, synthetic lethals • Functional complementation • Rescue experiments • An entry in the ‘with’ column is recommended

  50. IPI: Inferred from Physical Interaction • Any physical interaction between a gene product and another molecule, ion, or complex • 2-hybrid interactions • Co-purification • Co-immunoprecipitation • Protein binding experiments • An entry in the ‘with’ column is recommended

More Related