1 / 27

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P.

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence with low homology, to model it. Introduction 3D Structure : information to understand function to plan directed mutagenesis

oria
Download Presentation

Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence with low homology, to model it

  2. Introduction • 3D Structure : information • to understand function • to plan directed mutagenesis • Number of known structures (8000) smaller than known sequences (500000). • Experimental techniques : long and expensive • Alternative: modeling • Homology modeling : two homologues adopt the same structure

  3. %id. 100 50 40 30 25 20 0 Homology modeling (reliable) • Pairwise alignment: most features well predicted • multiple alignment Twilight zone Consensus of alignments, some features well predicted Midnight zone fold recognition (not very reliable) Not homologous BUT proteins of different sequences can adopt the same structure

  4. Sequence alignment is the critical step for homology modeling Below 30% of identities, there is no automatic method which allows reliable protein modeling

  5. Aim of our work to propose a reliable alignment method for proteins sharing a small percentage of identities with their template (<30%)

  6. General strategy for homology modeling Search databanks (PSI-BLAST) PDB template Multiple alignment of sequences Critical step target-template alignment Modeling Theoretical model evaluation Comparison model to real structure

  7. Our methodology • 1. Target selection : PDB proteins of which template shares between10 and 30 % of identities (ALIGN) • 2. Improvement of sequence-structure alignment • Building of 3 alignments • 2 from our method (consensus 1 and 2) • pairwise alignment PSI-BLAST (best alignment method for Twilight Zone proteins) • 3. Homology modeling from each target-template alignment • 4. evaluation :geometrical features of the models • 5. Comparison of each model to the real structure

  8. Our approach consists in building consensus of several alignment programs Multiple alignment Multiple alignment Target template Several programs

  9. Our approach consists in building consensus of several alignment programs Multiple alignment Multiple alignment Target template Several programs Pairwise alignment Several pairwise alignment Pairwise alignment

  10. Our approach consists in building consensus of several alignment programs Multiple alignment Multiple alignment Targettemplate Several programs Pairwise alignment Several pairwise alignments Pairwise alignment consensus Consensus building consensus

  11. 1)Alignment and modeling Databank searching PSI-BLAST pairwise alignment PSI-BLAST multiple alignments (12 alignments) Multiple alignments (8 alignements) 13 pairwise alignments Consensus 2 8 pairwise alignments Consensus 1 Model PSI-BLAST Model 2 Model 1

  12. 2) Comparison of models to real structure • global RMSD between model and structure after superposition • local RMSD :percentage of well predicted residues • Lower the distance, closer the model from the real structure. • A wrong modeled region can dramatically increase the global RMSD.

  13. Real structure PSI-BLAST Mod 2 3pte: D-alanyl- D- alanine carboxypeptidase de Streptomyces sp R161

  14. Results • 9 proteins have been modelled. • We can distinguish: • 3 proteins of the midnight zone (<20% id.) • 6 proteins of the twilight zone (20-30%)

  15. Comparison of models to the real structure Midnight Zone proteins (<20% id) • For all methods (models 1, 2, PSI), very bad results: most of theresidues have been badly modeled. • Actually, no reliable alignment method exists below 20%. • Our method (models 1 et 2) can not lower this threshold. Modeling of these 3 proteins confirms the limits of alignment methods below 20%.

  16. Twilight Zone proteins (20-30% id) • global and local RMS : most accuratemodels (4/6 et 5/6) come from our method (consensus 1 and 2). • In general, model 2 gives better results than model 1 and model PSI-BLAST. • It is better to use many alignment programs. models build from our methodology seem to be better than PSI-BLAST models.

  17. Comparison to CASP (Critical Assessment of techniques for protein Structure Prediction) • modeling of proteins for which structure is unknown by the entrants (revealed after competition) • comparison to the real structure (global RMS) The best CASP ’s models are taken as reference

  18. Conclusions • Limits of our method are defined below 20% of identities. • Our alignment method appears to be better than PSI-BLAST (above 20% id.) • Our results are comparable to the best CASP ’s performances (cfr. graph) consensus for sequence alignment has a future for homology modeling of Twilight Zone proteins.

  19. Perspectives (1) • Test our approach on a large set of proteins • improve our method: • giving more weight to better alignment programs • increasing the number of alignment programs • using several templates • using SSP and fold recognition

  20. Perspectives (2) • Evaluate the confidence of regions predicted by a lot of programs • take part in CASP competition • Automate : expert system (PHD thesis)

  21. 61 1d2f MHGVFGYSRW KNDE-FLAAI AHWFSTQHYT AIDSQTVVYG PSVIYMVSEL IRQWSETGEG consensus1 AQGKTKYAPP AGIPELREAL AEKFRRENGL SVTEEETIVT VGGKQALFNL FQAILDPGDE score1 4444444464 4444466666 6666666666 4444446666 6666666666 6688888888 consensus2 AQGKTKYAPP AGIPELREAL AEKFRRENGL SVTPEETIVT VGGKQALFNL FQAILDPGDE score2 5555555555 5434354455 4445444444 4443356666 6666666666 6688888888 121 1d2f VVIHTPAYDA FYKAIEGNQR TVMPVALEKQ ADGWFCDMGK LEAVLAKPEC KIMLLCSPQN consensus1 VIVLSPYWVS YPEMVRFAGG VVVEVETL-- ---------- --R----R-T KALVVNSPNN score1 8888888888 8888888666 66686664-- ---------- --4----4-4 4888888888 consensus2 VIVLSPYWVS YPEMVRFAGG VVVEVETL-P EEGFVPD-PE RVRRAITPRT KALVVNSPNN score2 8888888888 8888877777 77777553-2 1222222-33 3333444445 5888888888 181 1d2f PTGKVWTCDE LEIMADLCER HGVRVISDEI HMDMVWGEQP HIPWSNVARG DWALLTSGSK consensus1 PTGAVYPKEV LEALARLAVE HDFYLVSDEI YEHLLYEG-E HFSPGRVAPE HTLTVNGAAK score1 8888888888 8888888888 8888888888 8888888824 4666444466 4446446668 consensus2 PTGAVYPKEV LEALARLAVE HDFYLVSDEI YEHLLYEGEH FSPGRVA-PE HTLTVNGAAK score2 8888888888 8888888888 8888888888 8888888833 4444443-44 4445556668

  22. 1nec

  23. 1d2f modèle 2

More Related