290 likes | 491 Views
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms. Jianlin Cheng and Pierre Baldi Institute for Genomics and Bioinformatics School of Information and Computer Sciences University of California Irvine.
E N D
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics and Bioinformatics School of Information and Computer Sciences University of California Irvine
Importance of Predicting Beta-Sheet Structure • Ab-initio Structure Prediction • Fold Recognition • Model Refinement • Protein Design • Protein Folding Coil beta-sheet helix Rendered in Protein Explorer
An Example of Beta-Sheet Architecture Level 1 4 5 2 1 3 6 7 Structure of Protein 1VJG Beta Sheets
An Example of Beta-Sheet Architecture Level 1 Level 2 4 5 Antiparallel 2 1 3 6 7 Parallel Strand Strand Pair Strand Alignment Pairing Direction Structure of Protein 1VJG Beta Sheets
An Example of Beta-Sheet Architecture Level 1 Level 2 Level 3 4 5 Antiparallel H-bond 2 1 3 6 7 Parallel Strand Strand Pair Strand Alignment Pairing Direction Structure of Protein 1VJG Beta Sheets Beta Residue Residue Pair
Previous Work • Statistical potential approach for strand alignment(Hubbard, 1994; Zhu and Braun, 1999) • Statistical potentials to improve beta-sheet secondary structure prediction (Asogawa,1997) • Information theory approach for strand alignment(Steward and Thornton, 2000) • Neural networks for beta-residue pairs(Baldi, et al., 2000)
Three-Stage Prediction of Beta-Sheets • Stage 1 Predict beta-residue pairing probabilities using 2D-Recursive Neural Networks (2D-RNN, Baldi and Pollastri, 2003) • Stage 2 Use beta-residue pairing probabilities to align beta-strands • Stage 3 Predict beta-strand pairs and beta-sheet architecture using graph algorithms
Dataset and Statistics • Extract proteins with high resolution from Protein Data Bank (Berman et al., 2000) • Use DSSP (Kabsch and Sander, 1983) to assign intra-chain beta-sheet structure • Use UniqueProt (Mika and Rost, 2003) to reduce redundancy • Use PSI-BLAST (Altschul et al., 1997) to generate profiles Statistics
Stage 1: Prediction of Beta-Residue Pairings Using 2D-RNN Target / Output Matrix (m×m) Input Matrix I (m×m) (i,j) 2D-RNN O = f(I) (i,j) Tij: 0/1 Oij: Pairing Prob. Iij Xi-2 Xi-1 Xi Xi+1 Xi+2 Xj-2 Xj-1 Xj X j+1 Xj+2 |Xi – Xj| 20 profiles 3 SS 2 SA Xi or Xj is the position of beta-residue i or j in the sequence
An Example (Target) 1 2 3 4 5 6 7 Protein 1VJG Beta-Residue Pairing Map (Target Matrix)
An Example (Target) 1 2 3 4 5 6 7 Antiparallel Parallel Protein 1VJG Beta-Residue Pairing Map (Target Matrix)
Stage 2: Beta-Strand Alignment Antiparallel • Use output probability matrix as scoring matrix • Dynamic programming • Disallow gaps and use the simplified search algorithm Parallel Total number of alignments = 2(m+n-1)
Strand Alignment and Pairing Matrix • The alignment score is the sum of the pairing probabilities of the aligned residues • The best alignment is the alignment with the maximum score • Strand Pairing Matrix Strand Pairing Matrix of 1VJG
Stage 3: Prediction of Beta-Strand Pairings and Beta-Sheet Architecture (Constraints) (a) Seven strands of protein 1VJG in sequence order (b) Beta-sheet topology of protein 1VJG
Stage 3: Prediction of Beta-Strand Pairings and Beta-Sheet Architecture (Constraints) (a) Seven strands of protein 1VJG in sequence order 3 partners Protein: 1B7G Rendered in Rasmol (b) Beta-sheet topology of protein 1VJG
Minimum Spanning Tree Like Algorithm Strand Pairing Graph (SPG) (a) Complete SPG Strand Pairing Matrix
Minimum Spanning Tree Like Algorithm Strand Pairing Graph (SPG) (b) True Weighted SPG (a) Complete SPG Strand Pairing Matrix Goal: Find a set of connected subgraphs that maximize the sum of the alignment scores and satisfy the constraints Algorithm: Minimum Spanning Tree Like Algorithm
An Example of MST Like Algorithm 1 2 3 4 5 6 7 Step 1: Pair strand 4 and 5 1 2 3 4 5 4 5 6 7 Strand Pairing Matrix of 1VJG
An Example of MST Like Algorithm 1 2 3 4 5 6 7 Step 2: Pair strand 1 and 2 1 2 3 4 5 4 5 6 7 2 1 Strand Pairing Matrix of 1VJG N
An Example of MST Like Algorithm 1 2 3 4 5 6 7 Step 3: Pair strand 1 and 3 1 2 3 4 5 4 5 6 7 2 1 3 Strand Pairing Matrix of 1VJG N
An Example of MST Like Algorithm 1 2 3 4 5 6 7 Step 4: Pair strand 3 and 6 1 2 3 4 5 4 5 6 7 2 1 3 6 Strand Pairing Matrix of 1VJG N
An Example of MST Like Algorithm 1 2 3 4 5 6 7 Step 5: Pair strand 6 and 7 1 2 3 4 5 4 5 6 C 7 2 1 3 6 7 Strand Pairing Matrix of 1VJG N
A New Fold Example (Last CASP) True secondary structure 1S12 (94 residues) CEEEEECCCEEEEECCCCCHHHHHHHHHHHHHHHHHHHHCCCEEEEEECCEEEEEECCCCHHHHHHHHHHHHHHHHHHHHCCCCEEEEECCCCCC Predicted secondary structure by SSpro (Pollastri, et al., 2002) CEEEEEECCEEEECCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHEHHCCCCEEEEHHHHHHHHHHHHHHHHHHHHHHHHHCCCCEEEEEEECCC Strand Pairing Matrix Beta Sheet Topology 5 1 2 4 3 1s12 Rendered in Rasmol True: 1-2, 2-4, 3-4, 1-5 Predicted: 1-2, 2-4, 3-4, 4-5
Beta-Residue Pairing Results The accuracy of random algorithm is 2.3%. ROC Plot
Strand Pairing Results • Naïve algorithm of pairing all adjacent strands • Specificity = 42% • Sensitivity = 50% • All strand pairs are local strand pairs. • MST like algorithm • Specificity = 53% • Sensitivity = 59% • >20% correctly predicted strand pairs are non-local strand pairs.
Strand Alignment Results On the correctly predicted strand pairs On all native strand pairs • The accuracy of pairing direction is 15% • higher than that of the base-line algorithm. • The alignment accuracy is significantly • higher than previous methods.
Future Work and Applications • Allow a cycle to handle beta-barrel, allow gaps in alignment for beta bulge, add more inputs(Punta and Rost, 2005) for beta residue pairing prediction • Applications • Contact map • Fold recognition • Ab-initio structure prediction • Model refinement • Web server and dataset (SCRATCH suite) http://www.ics.uci.edu/~baldig/betasheet.html
Acknowledgement • Pierre Baldi, Arlo Randall, Michael Sweredoski • NIH grant (LM-07443-01) • NSF grant (EIA-0321390)