1 / 46

Finding the Beta Helix Motif

Finding the Beta Helix Motif. By Marcin Mejran. Papers. Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger

Download Presentation

Finding the Beta Helix Motif

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding the Beta Helix Motif By Marcin Mejran

  2. Papers • Predicting The -Helix Fold From Protein Sequence Databy Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger • Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognitionby Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi Gopalakrishnan

  3. Secondary Structure • Beta Strand • Forms -sheets • Alpha Helix • Stand alone • Can combine into more complex structures: • Beta sheets • Beta Helixes Images from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html

  4.  sheet

  5. Second and a half Structure beta helix beta barrel beta trefoil

  6. -Helix

  7. -Helix • Helix composed of three parallel  sheets • Three -strands per “rung” • Connecting “loops” • Not in Eukaryotes • Secreted by various bacteria • Right and left handed

  8. B3 T2 B2 -Helix • Few solved structures • 9 SCOP SuperFamilies • 14 RH solved structures in PDB • Solved structures differ widely B1

  9. -Helix • T2 turn: unique two residue loop • -strands are 3 to 5 residues. • T1 and T3 vary in size, may contain secondary structures • -strands interact between rungs

  10. -Helix • “Nice” structure • Repeating • parallel -stands • Rungs have similar structure • Stacking is predictable • Well conserved -stand across super-families Good choice from computational point of view

  11. -Helix • Long term interactions • Close in 3D but not 1D • “Non-unique” features • B2-T2-B3 segment • Unique features not clearly shown in sequence Usual methods don’t work Image from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.html

  12. BetaWrap • “Wraps” sequences around helix • Finds best “wrap” • Uses B2, B3 strands and T2 turn • Rest of rung varies greatly in size • Decomposes into sub-problems • Rungs • Find multiple rungs • Find B1 by local optimization

  13. B3 T2 B2 B1 Hydrophobic/charged • Hydrophobic • Dislikes Water • Hydrophilic • Like water • Charged • On Outside Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

  14. BetaWrap: Rungs • Given a T2 turn, find the next T2 turn Candidate Rung B3 T2 B2 Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

  15. B3 T2 B2 B1 BetaWrap: Rungs • More weight given to inward pairs • Certain stacked Amino Acids preferred • Penalty for highly charged inward residues • Penalizes too few or too many residues Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

  16. BetaWrap: Multiple Rungs • Find multiple initial B2-T2-B3 segments • Match pattern based on hydrophobic residues (appear on the inside) Φ – A,F,I,L,M,V,W,Y – D,E,R,K X - Any AFDEMVRKYE FIFDDEAK EDEMVMVFD

  17. BetaWrap: Multiple Rungs • DP is used to find 5 rungs in either direction from initial positions • α-helix filtering • Take average score of top 10 remaining wraps Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

  18. BetaWrap: Completing • Find B1 positions • Highest scoring parse • Does not affect wrap score. • Further filtering on hydrophobic residues in T1 and T2

  19. Training • Seven fold cross-validation • Partitioned based on families • Scores calculated for • α-helix filtering threshold • B1-score threshold • Hydrophobic count threshold • distribution of unmatched residues between rungs Image from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtml

  20. BetaWrap: Results

  21. BetaWrap: Results • Correctly identifies Beta-Helixes • Correctly separates helixes and non-helixes • Can predict -helixes across families

  22. BetaWrap: Summary Pros: • Finds beta-helixes • Accurate Cons: • Still makes errors • Rung placement • Hard coded information • Over-fitting • Hard to generalize

  23. Conditional Random Fields (CRFs) y1 y2 y3 y4 y5 y6 … HMM x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 … CRF x1 x2 x3 x4 x5 x6

  24. Hidden Markov Model • Set of States • Transition Probabilities • Emission Probabilities • Only given sequence of emitted residues • Find sequence of true states • Generative

  25. Hidden Markov Model • HMM: Maximize P(x,y|θ) = P(y|x,θ)P(x|θ) • x: emitted state/given sequence • y: “hidden”/true state • P(x,y|θ): Joint probability of x and y • P(y|x,θ): Probability of y given x • P(x|θ): Probability of x • Need to make assumptions about the distribution of x

  26. Viterbi Algorithm HMM • Find most likely path/most likely sequence of hidden states x1 x2 x3 x4 e1(x1) e1(x2) e1(x3) e1(x4) e2(x1) e2(x2) e2(x3) e2(x4) e3(x1) e3(x2) e3(x3) e3(x4)

  27. Viterbi Algorithm HMM v(i,j) = max(v(i-1,1)*t1,j*ej(xi), v(i-1,2)*t2,j*ej(xi) … v(i-k,1)*tk,j*ej(xi)) x1 x2 x3 x4 e1(x1) e1(x2) e1(x3) e1(x4) e2(x1) e2(x2) e2(x3) e2(x4) e3(x1) e3(x2) e3(x3) e3(x4)

  28. HMM Disadvantages • There is a strong independence assumption • Long term interactions are difficult to model • Overlapping features are difficult to model

  29. Conditional Random Fields (CRFs) • Replace transition and emission probabilities with a set of feature functions f(i,j,k) • Feature functions based on all xs, not just one • Not generative x1 x2 x3 x4 f(1,0,1) f(1,i,2) f(1,i,3) f(1,i,4) f(2,0,1) f(2,i,2) f(2,i,3) f(2,i,4) f(3,0,1) f(3,i,2) f(3,i,3) f(3,i,4)

  30. Conditional Random Fields (CRFs) • HMM: Maximize P(x,y|θ)=P(y|x,θ)P(x|θ) • CRF: Maximize P(y|x,θ) • Do not make assumptions about underlying distribution

  31. Same method as for HMM Viterbi CRFs x1 x2 x3 x4 f(1,0,1) f(1,i,2) f(1,i,3) f(1,i,4) f(2,0,1) f(2,i,2) f(2,i,3) f(2,i,4) f(3,0,1) f(3,i,2) f(3,i,3) f(3,i,4)

  32. Conditional Random Fields (CRFs) • States should form a chain • Likelihood function is convex for chain • Z0 = number of states • λk = weights

  33. Segmented CRFs • Each state corresponds to a structure • Represented as a graph G • States represent secondary structures • Nodes represent interactions • Chains are nicer than graphs

  34. Segmented CRFs • G =<V,E1,E2> • E1: Edges between neighbors • E2: Edges for long-term interactions • E1 edges can be implied in model

  35. Only E2 needs to be explicitly considered However • Graph needs to be a chain for E2 • Deterministic state transitions

  36. Beta-Helix CRF

  37. Beta-Helix CRF • Combined states • B23: B2,B3,T2 • Size assumptions: • B23: 8 residues • B1: 3 residues • T1,T3: 1 to 80 res.

  38. Intra-Node Features • Regular Expression Template for B23 Φ – A,F,I,L,M,V,W,Y – D,E,R,K FIFDDEAK X - Any

  39. Intra-Node Features • Probabilistic motif profiles for B23 and B1 • Use HMMER to generate profiles from known B23 and B1 sequences

  40. Intra-Node Features • Secondary Structure Prediction • PSIPRED • Helps locate T1 and T3 • 76 to 78% accuracy for α-helixes and coils • Segment length for T1 and T3 • Estimated as density function

  41. B3 T2 B2 Inter-Node Features • Side chain alignment scores • Alignment between B23 regions • More weight given to inward pairs

  42. Inter-Node Features • Parallel Beta-sheet alignment scores • Distance between adjacent B23 segments

  43. SCRF: Results

  44. SCRF: Results

  45. Summary • Discovered new beta-helix protein • Sf6 gp14 • Detected beta-helixes in plants • None known of before • More robust than BetaWrap

  46. Questions

More Related