500 likes | 720 Views
BCB 444/544. Lecture 24 Protein Tertiary Structure Prediction #24_Oct17. Required Reading ( before lecture). Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction Chp 15 - pp 214 - 230 Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini)
E N D
BCB 444/544 Lecture 24 • Protein Tertiary Structure Prediction #24_Oct17 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Required Reading (before lecture) MonOct 15- Lecture 23 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction • Chp 16 - pp 231 - 242 Fri Oct 18- Lecture 25 Gene Prediction • Chp 8 - pp 97 - 112 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
New Reading & Homework Assignment ALL: HomeWork #4(emailed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91.http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) • Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. • Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat Oct 13 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu(Genentech)Phage peptide and antibody libraries in protein engineering and ligand selection • Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Lyric Bartholomay(Ent, ISU) TBA BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Chp 15 - Tertiary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 15 Protein Tertiary Structure Prediction • Methods • Homology Modeling • Threading and Fold Recognition • Ab Initio Protein Structural Prediction • CASP BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Tertiary Structure Prediction Methods 2 (or 3) Major Methods: • Comparative Modeling: • Homology Modeling (easiest!) • Threading and Fold Recognition (harder) • Ab Initio Protein Structural Prediction (really hard) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Target Sequence ALKKGF…HFDTSE Structure Templates Steps in Threading Align target sequencewith template structures in fold library (usually from the PDB) Calculate energy score to evaluate "goodness of fit" between target sequence & template structure Rank models based on energy scores BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
A Local Example: Rapid Threading Approach for Protein Structure Prediction Kai-Ming Ho, Physics Haibo CaoYungok Ihm Zhong Gao James Morris Cai-zhuang WangDrena Dobbs, GDCBJae-Hyung Lee Michael Terribilini Jeff Sander Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004) Three-dimensional threading approach to protein structure recognition Polymer 45:687-697 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
1 i j N Template structure ( contact matrix) Otherwise A neighbor in sequence (non-contact) Simplify: Template structure representation if (contact) Å Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Simplify: Energy Function • Interaction “counts” only if two hydrophobic amino acid residues are in contact • At residue level, pair-wise hydrophobic interaction is dominant: E = i,j Cij Uij Cij : contact matrix Uij = U(residue I, residue J) MJ: U = Uij LTW: U = Qi*Qj HP: U = {1,0} Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
C M F I L Miyazawa-Jernigan (MJ) matrix: C M F I L V W 046 054 -020 049 -001 006 057 001 003 -008 052 018 010 -001 -004 Statistical potential 210 parameters Energy calculation: Contact energy Li-Tang-Wingreen (LTW): 20 parameters ~ solubility ~hydrophobicity contact matrix Contact Energy: with Yungok Ihm
1 Template Structure i Contact Matrix j Å = < C 1 , if r 6 5 N ij ij = C 0 , otherwise (a neighbor in sequence) ij Sequence Vector Sequence AVFMRIHNDIVYNDIANTTQ Contact Energy Scoring Function Summary of Ho Threading Procedure Yungok Ihm
Sequence – Structure (1D – 3D problem) ALKKGF…HFDTSE Sequence – Contact Matrix (1D – 2D problem) Sequence – 1D Profile (1D– 1D problem) Can complexity be further reduced? Consider simplifying structure representation, too Haibo Cao
Represent contact matrix by its dominanteigenvector (1D profile) • First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure • Higher ranking (rank > 4) eigenvectors are “sequence blind” Haibo Cao
1D Profile Maximize the overlap between the Sequence (S) and the profile (P) allowing gaps New profile Calculate contact energy using the alignment: Ec • Threading Alignment Step - now fast! • Align target sequence vector (1D) with eigenvector profile of template structure (1D) Cao et al Polymer 45 (2004) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Parameters for alignment? ALKKGFG…HFDTSE Loop Helix • Gap penalty: Insertion/deletion in helices or strands is strongly penalized; smaller penalties for in/dels in loops Gap penalties apply to alignment score only, not to energy calculation • Size penalty: If a target residue and aligned template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized Size penaltiesapply to alignment score only, not to energy calculation Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
How incorporate secondary structure? • Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V) N+ = total number of matches between predicted & actual secondary structure of template N- = total number of mismatches Ns=total number of residues selected in alignment “Global fitness” : f = 1 + (N+ - N-) / Ns Emod = f * Ethreading Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
How much better is this “fit” than random? Eshuffle: Shuffled Sequence vs Structure Erelative = Emod – Eshuffled E score modifed to reflect fit with predicted 2' structure Avg E score for same sequence shuffled (randomized) many times Yungok Ihm
Performance Evaluation? "Blind Test" CASP5 Competition (CASP7 is most recent) (Critical Assessment of Protein Structure Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Typical Results:(well, actually, our BEST Results):HO = #1-RankedCASP5 Prediction for this Target Actual Structure T174_1 Predicted Structure T174_2 • Target 174 • PDB ID = 1MG7 Cao, Ihm, Wang, Dobbs, Ho BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Overall Performance in CASP5 Contest ~8th out of 180(M. Levitt, Stanford) • FR Fold Recognition • (targets manually assessed by Nick Grishin) • ----------------------------------------------------------- • Rank Z-Score Ngood Npred NgNW NpNW Group-name • 1 24.26 9.00 12.00 9 12 Ginalski • 2 21.64 7.00 12.00 7 12 Skolnick Kolinski • 3 19.55 8.00 12.50 9 14 Baker • 4 16.88 6.00 10.00 6 10 BIOINFO.PL • 5 15.25 7.00 7.00 7 7 Shortle • 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA • 7 13.49 4.00 11.00 4 11 Brooks • 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming • 9 10.45 3.00 5.50 3 6 Jones-NewFold • ----------------------------------------------------------- • FR NgNW - number of good predictions without weighting for multiple models • FR NpNW - number of total predictions without weighting for multiple models BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
CASP - Check it out! Critical Assessment of Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/ • CASP7 contest - 2006: • http://www.predictioncenter.org/casp7/Casp7.html • Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP, EVA) & URLs for them • Related contests & resources: • Protein Function Prediction (part of CASP) • CAPRI = Critical Assessment of Predicted Interactions • New:CASPM = CASP for M = Mutant proteins • Predict effects of small (point) mutations, e.g., SNPs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Another Convenient List of Links for Protein Prediction Servers http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification • Protein Structural Visualization • Protein Structure Comparison • Protein Structure Classification BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Protein Structure Comparison Methods 3 Basic Approaches for Aligning Structures (see Xiong textbook for details) • Intermolecular • Intramolecular • Combined But, very active research area - many recent new methods 3 Popular Methods: • DALI = Distance Matrix Alignment of Structures (Holm) • FSSP Database • SSAP = Sequential Structure Alignment Program (Orengo) • CATH Database • CE = Combinatorial Extension (Bourne) • VAST at NCBI URLS: http://en.wikipedia.org/wiki/Structural_alignment_software BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16RNA Structure Prediction (Terribilini) • RNA Function • Types of RNA Structures • RNA Secondary Structure Prediction Methods • Ab Initio Approach • Comparative Approach • Performance Evaluation BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNA Function • Storage/transfer of genetic information • Newly discovered regulatory functions - RNAi pathways especially • Catalytic BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNA types & functions BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNA Structure • RNA forms complex 3D structures • Mainly single stranded • The single RNA strand can self-hybridize to form base paired regions BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Levels of RNA Structure Rob Knight Univ Colorado • Like proteins, RNA has primary, secondary, and tertiary structures • Primary structure - base sequence • Secondary structure - single stranded or base paired • Tertiary structure - 3D structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNA Structure Prediction • RNA tertiary structure is very difficult to predict • Focus on predicting RNA secondary structure • Given a RNA sequence, predict the secondary structure of the molecule • Almost all methods ignore higher order secondary structures like psuedoknots BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Base Pairing in RNA G-C, A-U, G-U ("wobble") & variants See: IMB Image Library of Biological Molecules http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Common structural motifs in RNA • Helices • Loops • Hairpin • Interior • Bulge • Multibranch • Pseudoknots Fig 6.2 Baxevanis & Ouellette 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNA Secondary Structure Prediction Methods • Two main types of methods • Ab initio - based on calculating the most energetically favorable secondary structure • Comparative approach - based on evolutionary comparison of multiple related RNA sequences BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Ab Initio Prediction • Only requires a single RNA sequence • Calculates minimum free energy structure • Base pairing lowers free energy of the structure, so methods attempt to find secondary structure with maximal base pairing BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Ab Initio Prediction • Free energy is calculated based on parameters determined in the wet lab • Known energy associated with each type of base pair • Base pair formation is not independent - multiple base pairs adjacent to each other are more favorable than individual base pairs - cooperative • Bulges and loops adjacent to base pairs have a free energy penalty BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Ab Initio Energy Calculation Method • Search for all possible base-pairing patterns • Calculate the total energy of the structure based on all stabilizing and destabilizing forces Fig 6.3 Baxevanis & Ouellette 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Dot Matrices • Can be used to find all possible base pair patterns • Compare the input sequence to itself and put a dot anywhere there is a complimentary base R Knight 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Dynamic Programming • Finding the best possible secondary structure is difficult - lots of possibilities • Compare RNA sequence with itself • Apply scoring scheme based on energy parameters for base pairs, cooperativity, and penalties for destabilizing forces • Find path that represents the most energetically favorable secondary structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Problem • DP returns the SINGLE best structure • There may be many structures with similar energies • Also, your predicted secondary structure is only as good as the energy parameters used • Solution - return multiple structures with near optimal energies BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Popular Ab Initio Prediction Programs • Mfold • Combines DP with thermodynamic calculations • Fairly accurate for short sequences, less accurate as sequence length increases • RNAfold • Returns multiple structures near the optimal structure • Computes a larger number of potential secondary structures than Mfold, so it uses a simplified energy function BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Comparative Approach • Uses multiple sequence alignment • Assumes related sequences fold into the same secondary structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Covariation • RNA functional motifs are conserved • To maintain RNA structure during evolution, a mutation in a base paired residue must be compensated for by a mutation in the base that it pairs with • Comparative methods search for covariation patterns in MSA BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Consensus Structures • Predict secondary structure of each individual sequence • Compare all structures and see if there is a most common structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Popular Comparative Prediction Programs • Two types • Require user to provide MSA • No MSA required BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNAalifold • Requires user to provide the MSA • Creates a scoring matrix combining minimum free energy and covariation information • DP is used to select the minimum free energy structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Foldalign • User provides a pair of unaligned RNA sequences • Foldalign constructs alignment then computes a commonly conserved structure • Suitable only for short sequences BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Dynalign • User provides two input sequences • Dynalign calculates possible secondary structures using algorithm similar to Mfold • Dynalign compares multiple structures from both sequences to find a common structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Performance Evaluation • Ab initio methods achieve correlation coefficient of 20-60% • Comparative approaches achieve correlation coefficient of 20-80% • Programs that require user to supply MSA are more accurate • Comparative programs are consistently more accurate than ab initio programs • Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures! - Gutell, Pace BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction