190 likes | 368 Views
Formalizations of Function & Literature Databases. Protein function prediction. What is function ? Various levels of description. What is function?. Contextual / philosophical point operational dichotomy I often use: biochemical function vs biological role:
E N D
Protein function prediction • What is function ? • Various levels of description
What is function? • Contextual / philosophical point • operational dichotomy I often use: biochemical function vs biological role: • Enolase (2-phospho-D-glycerate hydrolase) catalyses the interconversion of 2-phosphoglycerate and phosphoenolpyruvate • Part of the glycolysis pathway
But .. • α-enolase in addition functions as a lens structural protein, τ-crystallin in ducks • protein multi-functionality • Molecular function? Nice crystallization and refractory properties?
Gene Ontology • Historically: nothing … except swissprot keywords and specific systems for metabolic enzymes • This is somewhat problematic for automated gene function prediction (e.g. blast and/or co-expression) and for the study of the evolution of gene function. • Despite everything that we know as written down in the public literature !? • One (example) solution: Gene Ontology
Gene Ontology • computer science: an ontology is a data model that represents a domain and is used to reason about the objects in that domain and the relations between them. • GO:0008150 : biological_process • GO:0005575 : cellular_component • GO:0003674 : molecular_function
Gene Ontology: Molecular function • Molecular function describes activities, such as catalytic or binding activities, at the molecular level. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where or when, or in what context, the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products, but some activities are performed by assembled complexes of gene products. Examples of broad functional terms are catalytic activity, transporter activity, or binding; examples of narrower functional terms are adenylate cyclase activity or Toll receptor binding.
DNA-directed DNA polymerase activity • Accession: GO:0003887 • Ontology: molecular_function • Synonyms: alt_id: GO:0003888 • Definition: • Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template or primer. • Comment: None • Term Lineage • Graphical View • all : all ( 228266 ) • GO:0003674 : molecular_function ( 172339 ) • GO:0003824 : catalytic activity ( 68591 ) • GO:0016740 : transferase activity ( 22363 ) • GO:0016772 : transferase activity, transferring phosphorus-containing groups ( 13535 ) • GO:0016779 : nucleotidyltransferase activity ( 3400 ) • GO:0003887 : DNA-directed DNA polymerase activity( 519 )
Gene Ontology: Biological Process • A biological process is series of events accomplished by one or more ordered assemblies of molecular functions. Examples of broad biological process terms are cellular physiological process or signal transduction. Examples of more specific terms are pyrimidine metabolism or alpha-glucoside transport. It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps.
DNA replication • Accession: GO:0006260 • Ontology: biological_process • Synonyms: • related: DNA biosynthesis • related: DNA synthesis • Definition: • The process whereby new strands of DNA are synthesized. The template for replication can either be DNA or RNA. • Comment: • See also the biological process terms 'DNA-dependent DNA replication ; GO:0006261' and 'RNA-dependent DNA replication ; GO:0006278'. • Term Lineage • Graphical View • all : all ( 228266 ) • GO:0008150 : biological_process ( 166476 ) • GO:0009987 : cellular process ( 111929 ) • GO:0050875 : cellular physiological process ( 103960 ) • GO:0044237 : cellular metabolism ( 71681 ) • GO:0006139 : nucleobase, nucleoside, nucleotide and nucleic acid metabolism ( 27559 ) • GO:0006259 : DNA metabolism ( 8807 ) • GO:0006260 : DNA replication( 3202 )
Gene Ontology: Cellular Component • A cellular component is just that, a component of a cell, but with the proviso that it is part of some larger object; this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer). cellular_component
DNA-directed RNA polymerase II, core complex • Accession: GO:0005665 • Ontology: cellular_component • Synonyms: related: DNA-directed RNA polymerase II activity • Definition: • RNA polymerase II, one of three eukaryotic nuclear RNA polymerases, is a multisubunit complex; it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits comprise the most conserved portion including the catalytic site and share similarity with other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is composed of smaller subunits (generally ten or more), some of which are also found in RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template. GO:0005575 : cellular_component ( 116994 ) GO:0005623 : cell ( 86438 ) GO:0044464 : cell part ( 86397 ) GO:0005622 : intracellular ( 70018 ) GO:0044424 : intracellular part ( 69369 ) GO:0043229 : intracellular organelle ( 63194 ) GO:0043231 : intracellular membrane-bound organelle ( 58868 ) GO:0005634 : nucleus ( 12609 ) GO:0044428 : nuclear part ( 5000 ) GO:0031981 : nuclear lumen ( 3017 ) GO:0005654 : nucleoplasm ( 1990 ) GO:0044451 : nucleoplasm part ( 1791 ) GO:0016591 : DNA-directed RNA polymerase II, holoenzyme ( 462 ) GO:0005665 : DNA-directed RNA polymerase II, core complex(85)
go or no go • Used frequently for question such as: is there any functional pattern to my set of co-expressed genes? (overrepresentation of a particular process and/or complex) • Better than nothing. • How are the GO terms assigned (e.g. TAS vs IEA) • GO slim … • A framework / staring point
Use for questions like: what portion of proteins does human devote to transcription regulation: allows for such questions • Controlled vocabulary • Conceptual framework of thinking about our knowledge on cellular mechanisms
E(nzyme) C(ode) number: a hierarchical system to describe enzymatic function • EC 1 Oxidoreductases • EC 2 Transferases • EC 3 Hydrolases • EC 4 Lyases • EC 5 Isomerases • EC 6 Ligases • EC 2.7 Transferring phosphorus-containing groups • EC 2.7.7 Nucleotidyltransferases • EC 2.7.7.6 DNA-directed RNA polymerase
Homology ~ molecular function • In other words re metabolic pathways, homologs are observed to catalyze similar reactions, but often in different pathways.
So if we do function prediction using sequence (i.e. blast, trees ect. ) then? • If we think we see an ortholog we can transfer a lot of aspects of function and role • If we see only an homolog we can only transfer some aspects of molecular function but not process / role
Examples • Fringe as glycosyl transferase • ATPase family associated with various cellular activities (AAA)AAA family proteins often perform chaperone-like functions that assist in the assembly, operation, or disassembly of protein complexes • … so how to place query genes in a process/role then?