1 / 22

Protein secondary structure Prediction

Protein secondary structure Prediction. The problem. Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE. Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC. Why 2 nd Structure prediction?. Some historical landmarks. 1 st generation – 70’s (~50-60% accuracy) single residue statistics, explicit rules

omar
Download Presentation

Protein secondary structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein secondary structure Prediction • The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC • Why 2nd Structure prediction?

  2. Some historical landmarks • 1st generation – 70’s (~50-60% accuracy) • single residue statistics, explicit rules • Chou & Fasman 1974, GOR1 1978 • 2nd generation – 80’s (~60-70% accuracy) • single residue statistics, nearest-neighbors, neural network (more with local interaction) • GOR3 1987, Levin et al. 1986, Qian & Sejnowski 1988, Holly & Karplus, 1989 • 3rd generation – 90’s (~78% accuracy) • neural network with homologous sequence information • PHD 1993, PSIPRED 1999, SSPRO 2000

  3. Chou-Fasman method • Straight statistical approach • Conformational propensity e.g. helical propensity • Categorize each amino acid • e.g. helix former, helix breaker, helix indifferent • Find nucleation sites • short sequence with high concentration of a category • Extend the nucleation sites till a threshold • Handle overlaps

  4. Chou-Fasman method Conformational parameters (Table from Krane and Raymer’s book) • What is the drawback of the method?

  5. Introduction to neural network • A self learning system – using a training data set • A perceptron • An analogy – apple and orange sorter • Threshold unit – classify a vector of inputs • Weight ! How to get it?

  6. Basics in neural network (1 unit only) • Modify threshold unit a little bit • Step function vs. continuous threshold function (a) • Problem about weight • Do not fit examples exactly - minimize an error function

  7. Basics in neural network (1 unit only) • Squared error function E(w) • Minimize error E(w) - using gradient descent method • Weight update in each step • Learning rate 

  8. Basic neural network in secondary structure prediction (Figure from Kneller et. al. JMB 1990) Activation a1= Output y1= Error E1= E1 E2 E3 y1 y2 y3 w11 w12 w13 w14 x1 x2 x3 x4

  9. Multi-layer neural network • Complete neural network • - a set of continuous threshold units interconnected in a topology • - output of some unit is input of other units Output units (z) Hidden units (y) Input units (x) x1 x2 x3 x4

  10. PHD method (Rost B. & Sander C, JMB 1993) • Use profile of multiple sequence alignment • Multiple layers • Accuracy >70%

  11. Protein Folding Problem • A protein folds into a unique 3D structure in physiological condition • What is the protein folding problem? • 3D structure is a key to understand function mechanism • Rational drug design • 3D structure prediction

  12. Protein Folding Problem • Hard? • Can it be done? • Sampling conformational space • SS structures offer simplicity • Side chain filling the space • May not be random search • Free energy ( G) = • Interaction energy – Entropic energy

  13. Protein Folding Problem • Experimental finding • Protein does not start folding from the end • SS seem to fold early • Hydrophobic aa in the core • Hydrophilic aa on surface • Energy function approximation • Physics based (bond length, bond angle, pair interactions) • Statistics based

  14. Scope of the problem • Majority of the newly solved protein structure share certain level of similarity with a known structure • Certain families of proteins have no or few structures solved • Human genes ~20k • Structure genomics initiative

  15. Protein structure prediction • Comparative modeling • >30% sequence identify • Fold recognition – formally known as threading • twilight zone <25% sequence identity • Ab initio • new fold

  16. CASP Compare and rank Experimentally solved structure Predicted structure • CASP – • e.g. Skolnick (2003) Proteins: 53:p469-79 • Ginalski (2003) Proteins: 53: p410-17 • Zhang, Y. “Template-based modeling and free modeling by I-TASSER in CASP7 (pages 108–117)” Proteins, 69, S8, P108-17 (2007).

  17. Search for structures Select templates Align target sequence with structures Build model Evaluate model Comparative Modeling http://www.salilab.org/~andras/watanabe/main.html • Sequence identity vs. structure overlap (Fig)

  18. Comparative Modeling • Search for structures: • pair-wise sequence alignment with database • multiple sequence alignment -> profile • fold assignment / threading – use structure information in comparison • Select template: • sequence similarity, evolutionary relationship, environment, resolution • Sequence alignment (target and template) • standard method with tune

  19. Ab Inito Prediction • Challenge: • Search space • Energy function • Reduction in search space • use lattice • use simplified amino acids • use building blocks available in nature • Energy function: • physics • statistics - empirical

  20. Ab inito 3D Structure prediction An example - ROSETTA Simons KT, Kooperberg C, Huang E, Baker D; J Mol Biol. (1997) 268, 209-225 Schonbrun J, Wedemeyer W, Baker D; Current Opinion in Structure biology, (2002), 12:348-54 ROSETTA narrow search - use local structure available statistical based energy function one of the top few ab initio methods in CASP4.

  21. ROSETTA – segment matching Observations: Analysis of 9-a.a. segments in structure database distribution of the conformations of 9-mers Main idea of the method build segment conformational library (fragment library for 3mer and 9mer) put pieces together better (energy function and search space)

  22. Model Building • Assembly of rigid bodies • dissecting structure into core, loops and side- chains • Satisfy spatial constraints (Fig.) • derive spatial constraints, find a structure that optimize all the constraints • spatial constraints generated from • input alignment; • general spatial preferences found in known structures; • molecular force field;

More Related