1 / 88

Comparative Modelling Threading

U n i v e r s i t a t P o m p e u F a b r a. Baldomero Oliva Miguel. Comparative Modelling Threading. EVOLUTION. Diferences (proteins). Genetic information (DNA). DNA secuence 3D. U.S. Genome Project ( 1990 ). Identify human genes.

newton
Download Presentation

Comparative Modelling Threading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. U n i v e r s i t a t P o m p e u F a b r a Baldomero Oliva Miguel Comparative Modelling Threading

  2. EVOLUTION

  3. Diferences (proteins) Genetic information (DNA)

  4. DNA secuence 3D

  5. U.S. Genome Project ( 1990 ) Identify human genes Store the information in Data Banks Development of tools for analisis 1998 DOE The Sanger Centre Venter, Patrinos & Collins NIH NHGRI

  6. Increase of sequences 10500000 GenBank (DNA) 378152 TrEMBL 92211 Swiss Prot

  7. Public Database Holdings:

  8. sec. 3D 3D known (20%) Unknown function (20%) Homologs (30%) NO homology (20%) Remotes (10%)

  9. Knowing the structure helps to: • TO KNOWthe function: annotation • IMPROVE the function: genetic engineering • INHIBIT the function: drugs

  10. Structure Factory express & purify cristallize X-ray analises structure

  11. NO information 3D 3D 3D Enough information 3D Objetive Not accurate

  12. How does modelling help? Structural Genomics X-ray

  13. Can we predict protein structures from genome sequences?

  14. Study of known structures Assigning structure to sequences with unknown structure

  15. Caracterize the conformation sequence quaternary secondary Ligand docking Met Val Leu Tyr Asp domain Ser Thr • • • Tertiary super-secondary

  16. Mioglobin Carboxypeptidase Triose Fosfate Isomerase Ribonuclease Glucanase

  17. The number of different protein folds is limited: Known Folds New Folds

  18. The number of different protein folds is limited: [ last update: Oct 2001 ]

  19. Nº sequences > Nº folds Structura 3ª Classification

  20. Protein Structure Classification CATH - Protein Structure Classification [ http://www.biochem.ucl.ac.uk/bsm/cath_new/ ] • UCL, Janet Thornton & Christine Orengo • Class (C), Architecture(A), Topology(T), Homologous superfamily (H) SCOP - Structural Classification of Proteins • MRC Cambridge (UK), Alexey Murzin, Brenner S. E., Hubbard T., Chothia C. • created by manual inspection • comprehensive description of the structural and evolutionary relationships [ http://scop.mrc-lmb.cam.ac.uk/scop/ ]

  21. Class(C)derived from secondary structure content is assigned automatically • Architecture(A)describes the gross orientation of secondary structures, independent of connectivity. • Topology(T) clusters structures according to their topological connections and numbers of secondary structures • Homologous superfamily (H)

  22. Homolog Cholera toxin 80% Id Remote TSS toxin 8.8% Id Analog Aminoacyl tRNA synthetase 4.4% Id Enterotoxin SCOP • Family • Superfamily • Fold • Class

  23. MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL Can we predict protein structures ?

  24. Homology modeling Idea:Extrapolation of the structure for a new (target) sequence from the known 3D-structures of related family members (templates).

  25. Nº sequences > Nº folds Alignments Step 1 Selection of templates Step 2 Alignment with templates

  26. 1 0 0 i d e n t i t y 8 0 Sequence identity implies structural similarity 6 0 identity/similarity Percentage sequence 4 0 Don’t know region ..... 2 0 0 (B.Rost, Columbia, NewYork) 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 Number of residues aligned Sequence similarity implies structural similarity? .

  27. Nº sequences > Nº folds 2D Structure Hidropaticity Plot 2D prediction

  28. 2D prediction (PHD)

  29. Nº sequences > Nº folds 2D Structure Hidropaticity Plot 2D prediction Structure 3D Alignments Classification - Homologs - Remotes - Analogs

  30. Comparative Modelling (homology) • Fold is more conserved than sequence. • Secondary structure is the most conserved structure • Loops have the higher variability in structure.

  31. Nº sequences > Nº folds 2D Structure Hidropaticity Plot 2D prediction Structure 3D Alignments Classification - Homologs - Remotes - Analogs Comparative modelling - Composer - Modeler SCR & VR SCR VR - Ring Closure - DB Search - Classification

  32. Nº sequences > Nº folds 2D Structure Hidropaticity Plot 2D prediction Structure 3D Alignments Classification - Homologs - Remotes - Analogs Comparative modelling - Composer - Modeler SCR & VR SCR MODEL VR - Ring Closure - DB Search - Classification Step 3 Model Building

  33. Rigid-body Assembly Spatial Constrains Fragment Extraction Satisfaction of Spatial Constrains Loop Construction Optimisation Methods

  34. Rigid Body assembly a) Build conserved core framework • averaging core template backbone atoms (weighted by local sequence similarity with the target sequence) • Leave non-conserved regions (loops) for later ….

  35. Rigid Body assembly b) Loop modeling • use the “spare part” algorithm to find compatible fragments in a Loop-Database • “ab-initio” rebuilding of loops (Monte Carlo, molecular dynamics, genetic algorithms, etc.)

  36. D/N 100% D/N 100% D 100% aa{baalal}bb Xh{DXDpDG}Xh EF-Hand Calcium binding aa{baalal}bb Xh{DXDpDG}Xh

  37. P-loop GTP binding P-loop GTP binding Conformation bb{eppgag}aa Motif sequence hh{GhXXpG}Kp Conformation bb{eppgag}aa Motif sequence hh{GhXXpG}Kp

  38. bb{eab}aa hh{GhG}hX NAD(P)/FAD binding bb{eab}aa hh{GhG}hX

  39. Ring closure search Distance restraints

  40. Building CA Superposition

  41. Building CA Superposition

  42. Q M T S A F G T A E Modeling by Satisfaction of Spatial restraints A. Sali & T. Blundel (1993) JMB 234, 779 - 815

  43. Modeling by Satisfaction of Spatial restraints • Feature properties can be associated with • a protein (e.g. X-ray resolution) • residues (e.g. solvent accessibility) • pairs of residues (e.g. Ca - Ca distance) • other features (e.g. main chain classes) How can we derive modeling restraints from this data? A restraint is defined as probability density function (pdf) p(x): with

  44. Modeling by Satisfaction of Spatial restraints combine basis pdfs to molecular probability density functions

  45. Modeling by Satisfaction of Spatial restraints • Satisfaction of spatial restraints • Find the protein model with the highest probability • Variable target function: • Start with e.g. a random conformation model and use only local restraints • minimize some steps using a conjugate gradient optimization • repeat with introducing more and more long range restraints until all restraints are used

  46. Threading Idea:Find the optimal structure for a new (target) sequence in the set of known 3D-structures (templates) by threading the target sequence.

  47. Nº sequences > Nº folds Structure 3D Classification - Homologs - Remotes - Analogs Step 1 Selection of templates FOLD ASSIGNMENT

  48. Nº sequences > Nº folds Structure 3D Classification - Homologs - Remotes - Analogs Pseudo Potencial

  49. Pseudo-potencial Principles Boltzman law Potential of Mean Force: Applied on proteins Use frequencies instead of probabilities. They are taken from non-homologous proteins on the PDB.

  50. ?  Fold recognition / Threading Principle: Find a compatible fold for a given sequence .... >Protein XY MSTLYEKLGGTTAVDLAVDKFYERVLQDDRIKHFFADVDMAKQRAHQKAFLTYAFGGTDKYDGRYMREAHKELVENHGLNGEHFDAVAEDLLATLKEMGVPEDLIAEVAAVAGAPAHKRDVLNQ • Using ... • 1D – 3D profile matching, • secondary structure predictions, • position specific scoring matrices (PSSM), • mean force potentials, • keyword statistics, • ....

More Related