1 / 24

Rita Casadio

Prediction of protein function from sequence analysis. Rita Casadio. BIOCOMPUTING GROUP University of Bologna, Italy. The “omic” era. Genome Sequencing Projects:. Archaea : 74 species In P rogress:52. Bacteria : 973 species In Progress: 2266 species. Complete-23 Draft Assembly–318

ramla
Download Presentation

Rita Casadio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy

  2. The “omic” era Genome Sequencing Projects: Archaea : 74species In Progress:52 Bacteria: 973species In Progress: 2266species Complete-23 DraftAssembly–318 In Progress-359 Eukaryotic: http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html Update: January 2010

  3. The Data Bases of Biological Sequences and Structures GenBank: 108,431,692 sequences 106,533,156,756 nucleotides NR(*): 10,381,779sequences 3,542,056,219 residues >BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH 35,5 HGE! SwissProt: 514,212sequences 180,900,945residues PDB: 60,654structures membrane proteins <2% Update: January 2009 (*) CDS translations+PDB+SwissProt+PIR+PRF

  4. …with different effects depending on variability Genes in DNA... (about 30,000 in the human genome) >protein kinase acctgttgatggcgacagggactgtatgctgatctatgctgatgcatgcatgctgactactgatgtgggggctattgacttgatgtctatc.... Over 20 millions of single mutations are known in genes …code for proteins... …proteins correspond to functions... ….in methabolic pathways Proteins interact From 5000 to 10000 proteins per tissue …when they are expressed From Genotype to Phenotype

  5. STRING 8—a global view on proteins and their functional interactions in 630 organisms- Jensen et al., 2009, Nucleic Acids Research, Vol 37. The Human Interactome in STRING 22,937 proteins and 1,482,533 interactions http://string.embl.de

  6. One problem of the “omic era”: Protein functional annotation

  7. The Protein Data Bank http://www.rcsb.org/pdb/home/home.do No of Proteins with known structure: 57529

  8. SCOP: Structural Classification of Proteins • Domains are hierarchically classified: • class • - fold: proteins with secondary structures in same arrangement with the same topological connections • superfamily: structures and functional features suggest a common evolutionary origin • family: proteins with identities ≥30%; with identities <30% but with similar structures and functions

  9. From the Protein Sequence to the Structure and Function space Lesk A., 2004

  10. 100% • Sequence comparison From the Protein Sequence to the Structure space Sequence Identity (%) 30% • Fold recognition • Machine-learning aided alignment • Threading PDB 0% • Ab initio and de novo modelling • Machine-learning prediction of structural features New Folds

  11. From the Protein Sequence to the Structure and Function space What is protein function?

  12. What is a function? For enzymes: function can be defined on the basis of the catalysed molecular reaction. e.g. aspartic aminotransferase (AST)

  13. In biochemistry, a transaminase or an aminotransferase is an enzyme that catalyzes a type of reaction between an amino acid and an α-keto acid. Specifically, this reaction (transamination) involves removing the amino group from the amino acid, leaving behind an α-keto acid, and transferring it to the reactant α-keto acid and converting it into an amino acid. The enzymes are important in the production of various amino acids, and measuring the concentrations of various transaminases in the blood is important in the diagnosing and tracking many diseases. Transaminases require the coenzyme pyridoxal-phosphate, which is converted into pyridoxamine in the first phase of the reaction, when an amino acid is converted into a keto acid. Enzyme-bound pyridoxamine in turn reacts with pyruvate, oxaloacetate, or alpha-ketoglutarate, giving alanine, aspartic acid, or glutamic acid, respectively. The presence of elevated transaminases can be an indicator of liver damage.

  14. Enzyme Commission (E.C.) classification A hierarchical classification for enzymes

  15. EC 2.6 Transferring nitrogenous groups EC 2.6.1Transaminases EC 2.6.1.1 Aspartate transaminase Other name(s): glutamic-oxaloacetic transaminase; glutamic-aspartic transaminase; transaminase A; AAT; AspT; 2-oxoglutarate-glutamate aminotransferase; aspartate α-ketoglutarate transaminase; aspartate aminotransferase; aspartate-2-oxoglutarate transaminase; aspartic acid aminotransferase; aspartic aminotransferase; aspartyl aminotransferase; AST; glutamate-oxalacetate aminotransferase; glutamate-oxalate transaminase; glutamic-aspartic aminotransferase; glutamic-oxalacetic transaminase; glutamic oxalic transaminase; GOT (enzyme); L-aspartate transaminase; L-aspartate-α-ketoglutarate transaminase; L-aspartate-2-ketoglutarate aminotransferase; L-aspartate-2-oxoglutarate aminotransferase; L-aspartate-2-oxoglutarate-transaminase; L-aspartic aminotransferase; oxaloacetate-aspartate aminotransferase; oxaloacetate transferase; aspartate:2-oxoglutarate aminotransferase; glutamate oxaloacetate transaminase Systematic name: L-aspartate:2-oxoglutarate aminotransferase

  16. Problems: Isoforms e.g How to differentiate the function of the cytoplasmic aspartate amintransferase from that of mitochondrial isoform? Non enzymatic proteins

  17. The Ontologies • Cellular component • Biological process • Molecular function GO function vocabulary: http://www.geneontology.org/

  18. Gene Ontology classification: The human cytoplasmic aspartate transaminase GO:0004069 GO:0005829 GO:0006533

  19. One BIG problem of the “omic era”: Protein functional annotation

  20. Functional annotation in silico by homology search ADH1_SULSO ----------MRAVRLVEIGKP--LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVHMRQGRFGNLRIVE ADH_CLOBE ----------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPCTSDIHTVFEGA------- ADH_THEBR ----------MKGFAMLSIGKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA------- ADH1_SOLTU MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG------- ADH2_LYCES MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG------- ADH1_ASPFL ----MSIPEMQWAQVAEQKGGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW------- Sequence comparison is performed with alignment programs Sequence identity  40 % Similar structure and function (??) Methods for similarity searches: BLAST, Psi-BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) sequence Altschul et al., (1990) J Mol Biol 215:403-410 Altschul et al., (1998) Nucleic Acids Res. 25:3389-3402 Pfam (http://pfam.wustl.edu/hmmsearch.shtml) sequence/structure Bateman et al., (2000) Nucleic Acids Research 28:263-266

  21. Transfer by inheritance: Function annotation transfer from sequence through homology

  22. http://www.uniprot.org/

  23. PDB The annotation process at UniProt

  24. Open problems of “inheritance through homology “ • Not all UniProt files are GO annotated • The optimal threshold value of sequence identity for function transfer is not known • Proteins contain multiple domains • Proteins can share common domains and not necessarily the same function • In proteins different combination of shared domains lead to different biological roles

More Related