1 / 40

Identification of specificity-determining positions in protein alignments

Identification of specificity-determining positions in protein alignments. Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems, RAS ECCB2005, Madrid. Motivation.

tawana
Download Presentation

Identification of specificity-determining positions in protein alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of specificity-determining positions in protein alignments Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems, RAS ECCB2005, Madrid

  2. Motivation • Large protein families with general function assigned by homology, not much functional information • Much less structural data. Not many structures with substrates, cofactors etc. • Some specificity assignments from comparative genomics => • Search for specificity-determining positions in alignments • identification of functional sites • prediction of specificity • understanding and eventually re-design of function

  3. S-box (rectangle frame)MetJ (circle frame)LYS-element (circles)Tyr-T-box (rectangles) Specificity (of transporters) from comparative genomics – three examples. 1. New specificities in a little studied family malate/lactate

  4. 2. Misleading homology:The PnuC family of transporters The THI elements The RFN elements

  5. 3. A nightmare. The NiCoT family of nickel-cobalt transporters

  6. SDP (Specificity-Determining Position) Alignment position that is conservedwithin groups of proteins having the same specificity (specificity groups) but differs between them SDPis not equivalent to a functionally important position

  7. Measure of specificity: mutual information • count of amino acid α in group i at position p divided by the total number of sequences • frequency of amino acid α in position p • fraction of proteins in group i

  8. Taking into account the structure of the phylogenetic tree: random shuffling and linear regression Z-score linear regression  min => positions that are more specific than expected given the tree

  9. Smoothing: pseudocounts and similarity between amino acid residues • m(ab) = amino acid substitution matrix • n(a,i) = count of amino acid a at position i

  10. Automated threshold setting: the Bernoulli estimator Are 5 SDP with Z-score > 12better than 10 SDP with Z-score > 9? 

  11. Other similar techniques • Evolutionary trace (Lichtarge et al. 1996, 1997) – need structure; gradual construction of group-specific consensus • Evolutionary rate shifts (DIVERGE, Gu et al. 2002) – positions with group-specific evolutionary rate • Surface patches of slowly evolving residues (Rate4Site, Pupko et al. 2002) – need structure • PCA in the sequence space (Casari et al., 1995) • Correlated mutations (Pazos and Valencia, 2002) • Prediction of functional sub-types (Hannenhalli and Russell, 2000) – relative entropy of HMM profiles for groups

  12. SDPpred:Web interface Input: multiple alignment of proteins divided into specificity groups === AQP === %sp|Q9L772|AQPZ_BRUME -------------------------------------mlnklsaeffgtfwlvfggcgsa ilaa--afp-------elgigflgvalafgltvltmayavggisg--ghfnpavslgltv iiilgsts------------------------------slap------------------ qlwlfwvaplvgavigaiiwkgllgrd--------------------------------- ------ %sp|P48838|AQPZ_ECOLI -------------------------------------mfrklaaecfgtfwlvfggcgsa vlaa--gfp-------elgigfagvalafgltvltmafavghisg--ghfnpavtiglwa lvihgatd------------------------------kfap------------------ qlwffwvvpivggiiggliyrtllekrd-------------------------------- ------ %tr|Q92ZW9 -------------------------------------mfkklcaeflgtcwlvlggcgsa vlas--afp-------qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglav iiilgsth------------------------------rrvp------------------ qlwlfwiaplfgaaiagivwksvgeefrpvd----------------------------- ------ === GLP === %sp|P11244|GLPF_ECOLI ----------------------------msqt---stlkgqciaeflgtglliffgvgcv aalkvag---------a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwl glilaltd------------------------------dgn--------------g-vpr -flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl-------- ------ %sp|P44826|GLPF_HAEIN ----------------------------mdks-----lkancigeflgtalliffgvgcv …

  13. SDPpred:Output Detailed description of each SDP (List of SDPs) Plot of probabilities used by the Bernoulli estimator to set the cutoff (Probability plot view) Alignment of the family with the SDPs highlighted (Alignment view)

  14. Transcription factors from the LacI family • Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups – 44 SDPs 10residues contactNPF (analog of the effector) 7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ) 6 residues in the intersubunit contacts 5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ) 7residues contact the operator sequence 6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ) LacI from E.coli

  15. SDP clusters at the subunit contact region Cluster I Effector Cluster II DNA operator LacI (lactose repressor) from E.coli (1jwl)

  16. Total 348 amino acids 44 SDP Overall statistics (LacI of E. coli) Non-contacting residues (distance to the DNA, effector, or the other subunit >10Ǻ) Contact zone (may be functional) Contacting residues (distance to the DNA, effector, or the other subunit <5Ǻ)

  17. Membrane channels of the MIP family • Training set: 17 sequences, average length 280 amino acids, 2 specificity groups: Aquaporines & glyceroaquaporines – 21 SDPs 8 residues contact glycerol (substrate) (dmin<5Ǻ) 8residues oriented to the channel 5 residues in the contacts with other subunits GlpF from E.coli

  18. Two SDP clusters at the contact of subunits forming the tetramer Cluster II Cluster I 20Leu, 24Ile, 108Tyrof one subunit, 193Serof another subunit Glu43 Substrate (glycerol) Subunit I Glpf (glycerol facilitator) from E. coli (1fx8)

  19. Total 281 amino acids 21 SDP Overall statistics (GlpF fromE.coli) Non-contacting residues (distance to the substrate, or another subunit >10Ǻ) Contact zone (may be functional) Contacting residues (distance to the substrate, or another subunit <5Ǻ)

  20. isocitrate/isopropylmalate dehydrogenases : combinations of specificities towards substrate and cofactor • IDH: catalyzes the oxidation of isocitrate to α-ketoglutorate and CO2 (TCA) using either NAD or NADP as a cofactorin organisms from prokaryotes to higher eukaryotes • IMDH: catalyzes oxidative decarboxylation of 3-isopropylmalate into 2-oxo-4-methylvalerate (leucine biosynthesis) in prokaryotes and fungi, the cofactor is NAD Eukaryota Archaea Bacteria Eukaryota Mitochondria Archaea Bacteria

  21. Selecting specificity groups 2. By cofactor: all NAD-dependent vs. all NADP-dependent 1. By substrate: all IDHs vs. all IMDHs 3. Four groups IDH (NADP) type II IDH (NADP) type II IDH (NADP) type II IDH (NAD) IDH (NAD) IDH (NAD) IMDH (NAD) IMDH (NAD) IMDH (NAD) IDH (NADP) type I IDH (NADP) type I IDH (NADP) type I

  22. Predicted SDPs most SDPs near the substrate SDPs near the substrate and the cofactor SDPs near the substrate, the cofactor and the other subunit

  23. SDPs, the cofactor and the substrate Substrate (isocitrate) 100Lys, 104Thr, 105Thr, 107Val, 337Ala, 341Thr: substrate-specific and four group SDPs, functionally not characterized Cofactor (NADP) Nicotinamide nucleotide Adenine nucleotide 344Lys, 345Tyr, 351Val: cofactor-specific SDPs, known determinants of specificity to cofactor NADP-dependent IDH from E. coli (1ai2)

  24. SDPs predicted for different groupings substrate-specific SDPs cofactor-specific SDPs 208Arg 337Ala 100Lys 300Ala 105Thr 341Thr 229His 154Glu 103Leu 233Ile 97Val 158Asp 115Asn 305Asn 308Tyr 98Ala 155Asn 231Gly 327Asn 344Lys 287Gln 164Glu 351Val 345Tyr 241Phe 38Gly 40Asp 104Thr Color code: Contacts cofactor Contacts substrate AND cofactor Contacts substrate Contacts substrate AND the other subunit Contacts the other subunit 107Val 152Phe 323Ala 245Gly 161Ala 232Asn 36Gly 31Tyr 162Gly Four groups 45Met

  25. Overview • Transcription factors: contacts with the cofactor and the DNA • Transporters: contacts with the substrate • Enzymes: contacts with the substrate and the cofactor And all: • contacts between subunits

  26. Protein-DNA interactions Entropy at aligned sites (blue plots) and the number of contacts (red: heavy atoms in a base pair at a distance <cutoff from a protein atom) CRP PurR IHF TrpR

  27. The observed correlation does not depend on the distance cutoff

  28. CRP/FNR family of regulators

  29. Correlation between contacting nucleotides and amino acid residues Contacting residues: REnnnR TG: 1st arginine GA: glutamate and 2nd arginine • CooA in Desulfovibrio spp. • CRP in Gamma-proteobacteria • HcpR in Desulfovibrio spp. • FNR in Gamma-proteobacteria DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP KITRQEIGQIVGCSRETVGRILKMLED YP CRP KXTRQEIGQIVGCSRETVGRILKMLED VC CRP KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR TMTRGDIGNYLGLTVETISRLLGRFQK TGTCGGCnnGCCGACA TTGTGAnnnnnnTCACAA TTGTgAnnnnnnTcACAA TTGATnnnnATCAA

  30. The correlation holds for other factors in the family

  31. Plans and perspectives. Protein-DNA interactions LacI family of transcriptional regulators (each branch represents a subfamily)

  32. … and their signals 1605 regulators from 189 genomes, forming 302 groups of orthologs and binding 2518 sites

  33. Plans and perspectives. Experimental verification • A new family of Ni/Co transporters • No structural data • Specificity predicted by comparative genomics • Predicted SDPs form several clusters in the alignment, are located on the same sides of alpha-helices • Mutational analysis

  34. Terminators of translation in prokaryotes / decoding of stop-codons. Specificity of RF1 (UAG, UAA) and RF2 (UGA, UAA) Fragment of the alignment (117 pairs). SDPs are shown by black boxes above the alignment.

  35. “Interesting” positions:invariant, SDPs, variable rate.

  36. SDPs andinvariant positions:two decoding sites?

  37. Plans and perspectives • Use of 3D structures, when available. Identification of functional sites as spatial clusters of SDPs and conserved positions • Automated identification of specificity groups based on the analysis of the phylogenetic tree • Protein-DNA interactions • Identification of protein-protein contact surfaces

  38. Publications • N.J.Oparina, O.V.Kalinina, M.S.Gelfand, L.L.Kisselev (2005) Common and specific amino acid residues in the prokaryotic polypeptide release factors RF1 and RF2: possible functional implications. Nucleic Acids Research 33 (in press). • O.V.Kalinina, A.A.Mironov, M.S.Gelfand, A.B.Rakhmaninova (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Science 13: 443-456. • O.V.Kalinina, P.S.Novichkov, A.A.Mironov, M.S.Gelfand, A.B.Rakhmaninova (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucleic Acids Research 32: W424-W428. • O.V.Kalinina, M.S.Gelfand, A.A.Mironov, A.B.Rakhmaninova (2003) Amino acid residues forming specific contacts between subunits in tetramers of the membrane channel GlpF. Biophysics (Moscow) 48: S141-S145. • L.A.Mirny, M.S.Gelfand (2002) Using orthologous and paralogous proteins to identify specificity determining residues in bacterial transcription factors. Journal of Molecular Biology 321: 7-20. • L.Mirny, M.S.Gelfand (2002) Structural analysis of conserved base-pairs in protein-DNA complexes. Nucleic Acids Research 30: 1704-1711. • http://math.belozersky.msu.ru/~psn/

  39. Leonid Mirny (Harvard, MIT) Olga Kalinina Andrei A. Mironov Alexandra B. Rakhmaninova Dmitry Rodionov Olga Laikova Howard Hughes Medical Institute Ludwig Institute of Cancer Research Russian Fund of Basic Research Russian Academy of Sciences, programs “Molecular and Cellular Biology”and “Origin and Evolution of the Biosphere” Acknowledgements

More Related