1 / 19

Formalizations of Function & Literature Databases

Formalizations of Function & Literature Databases. Protein function prediction. What is function ? Various levels of description. What is function?. Contextual / philosophical point operational dichotomy I often use: biochemical function vs biological role:

basil
Download Presentation

Formalizations of Function & Literature Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formalizations of Function & Literature Databases

  2. Protein function prediction • What is function ? • Various levels of description

  3. What is function? • Contextual / philosophical point • operational dichotomy I often use: biochemical function vs biological role: • Enolase (2-phospho-D-glycerate hydrolase) catalyses the interconversion of 2-phosphoglycerate and phosphoenolpyruvate • Part of the glycolysis pathway

  4. But .. • α-enolase in addition functions as a lens structural protein, τ-crystallin in ducks • protein multi-functionality • Molecular function? Nice crystallization and refractory properties?

  5. Gene Ontology • Historically: nothing … except swissprot keywords and specific systems for metabolic enzymes • This is somewhat problematic for automated gene function prediction (e.g. blast and/or co-expression) and for the study of the evolution of gene function. • Despite everything that we know as written down in the public literature !? • One (example) solution: Gene Ontology

  6. Gene Ontology • computer science: an ontology is a data model that represents a domain and is used to reason about the objects in that domain and the relations between them. • GO:0008150 : biological_process • GO:0005575 : cellular_component • GO:0003674 : molecular_function

  7. Gene Ontology: Molecular function • Molecular function describes activities, such as catalytic or binding activities, at the molecular level. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where or when, or in what context, the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products, but some activities are performed by assembled complexes of gene products. Examples of broad functional terms are catalytic activity, transporter activity, or binding; examples of narrower functional terms are adenylate cyclase activity or Toll receptor binding.

  8. DNA-directed DNA polymerase activity • Accession: GO:0003887 • Ontology: molecular_function • Synonyms: alt_id: GO:0003888 • Definition: • Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template or primer. • Comment: None • Term Lineage • Graphical View • all : all ( 228266 ) • GO:0003674 : molecular_function ( 172339 ) • GO:0003824 : catalytic activity ( 68591 ) • GO:0016740 : transferase activity ( 22363 ) • GO:0016772 : transferase activity, transferring phosphorus-containing groups ( 13535 ) • GO:0016779 : nucleotidyltransferase activity ( 3400 ) • GO:0003887 : DNA-directed DNA polymerase activity( 519 )

  9. Gene Ontology: Biological Process • A biological process is series of events accomplished by one or more ordered assemblies of molecular functions. Examples of broad biological process terms are cellular physiological process or signal transduction. Examples of more specific terms are pyrimidine metabolism or alpha-glucoside transport. It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps.

  10. DNA replication • Accession: GO:0006260 • Ontology: biological_process • Synonyms: • related: DNA biosynthesis • related: DNA synthesis • Definition: • The process whereby new strands of DNA are synthesized. The template for replication can either be DNA or RNA. • Comment: • See also the biological process terms 'DNA-dependent DNA replication ; GO:0006261' and 'RNA-dependent DNA replication ; GO:0006278'. • Term Lineage • Graphical View • all : all ( 228266 ) • GO:0008150 : biological_process ( 166476 ) • GO:0009987 : cellular process ( 111929 ) • GO:0050875 : cellular physiological process ( 103960 ) • GO:0044237 : cellular metabolism ( 71681 ) • GO:0006139 : nucleobase, nucleoside, nucleotide and nucleic acid metabolism ( 27559 ) • GO:0006259 : DNA metabolism ( 8807 ) • GO:0006260 : DNA replication( 3202 )

  11. Gene Ontology: Cellular Component • A cellular component is just that, a component of a cell, but with the proviso that it is part of some larger object; this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer). cellular_component

  12. DNA-directed RNA polymerase II, core complex • Accession: GO:0005665 • Ontology: cellular_component • Synonyms: related: DNA-directed RNA polymerase II activity • Definition: • RNA polymerase II, one of three eukaryotic nuclear RNA polymerases, is a multisubunit complex; it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits comprise the most conserved portion including the catalytic site and share similarity with other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is composed of smaller subunits (generally ten or more), some of which are also found in RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template. GO:0005575 : cellular_component ( 116994 ) GO:0005623 : cell ( 86438 ) GO:0044464 : cell part ( 86397 ) GO:0005622 : intracellular ( 70018 ) GO:0044424 : intracellular part ( 69369 ) GO:0043229 : intracellular organelle ( 63194 ) GO:0043231 : intracellular membrane-bound organelle ( 58868 ) GO:0005634 : nucleus ( 12609 ) GO:0044428 : nuclear part ( 5000 ) GO:0031981 : nuclear lumen ( 3017 ) GO:0005654 : nucleoplasm ( 1990 ) GO:0044451 : nucleoplasm part ( 1791 ) GO:0016591 : DNA-directed RNA polymerase II, holoenzyme ( 462 ) GO:0005665 : DNA-directed RNA polymerase II, core complex(85)

  13. go or no go • Used frequently for question such as: is there any functional pattern to my set of co-expressed genes? (overrepresentation of a particular process and/or complex) • Better than nothing. • How are the GO terms assigned (e.g. TAS vs IEA) • GO slim … • A framework / staring point

  14. Use for questions like: what portion of proteins does human devote to transcription regulation: allows for such questions • Controlled vocabulary • Conceptual framework of thinking about our knowledge on cellular mechanisms

  15. E(nzyme) C(ode) number: a hierarchical system to describe enzymatic function • EC 1 Oxidoreductases • EC 2 Transferases • EC 3 Hydrolases • EC 4 Lyases • EC 5 Isomerases • EC 6 Ligases • EC 2.7 Transferring phosphorus-containing groups • EC 2.7.7 Nucleotidyltransferases • EC 2.7.7.6 DNA-directed RNA polymerase

  16. Homology ~ molecular function • In other words re metabolic pathways, homologs are observed to catalyze similar reactions, but often in different pathways.

  17. Homology ~ molecular function

  18. So if we do function prediction using sequence (i.e. blast, trees ect. ) then? • If we think we see an ortholog we can transfer a lot of aspects of function and role • If we see only an homolog we can only transfer some aspects of molecular function but not process / role

  19. Examples • Fringe as glycosyl transferase • ATPase family associated with various cellular activities (AAA)AAA family proteins often perform chaperone-like functions that assist in the assembly, operation, or disassembly of protein complexes • … so how to place query genes in a process/role then?

More Related