910 likes | 1.16k Views
P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li. 0 . OUTLINE. Introduction Problem Methods (4) HMM Examples (3) Segmentation HMM Profile HMM Conditional Random Field Proposal.
E N D
PROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li
0. OUTLINE • Introduction • Problem • Methods (4) • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal
1. INTRODUCTION • Introduction * • Problem • Methods (4) • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal
1. Genomics • Achievements in Genomic • BLAST (Basic Local Alignment Search Tool) • most cited paper published in 1990s • more than 15,000 times • Human genome project • Completion April 2003
1. Proteomics • Precedence to Proteomics • Protein Data Bank (PDB) • 40,132 structures • cited more than 6,000 times
1. Proteomics Number of Protein Structures in Protein Data Bank
1. Secondary Structure • Importance • The known secondary structure may be used as an input for the tertiary structure predictions.
1. Protein Structure • Primary Structure
1. Protein Structure • Secondary Structure
1. Secondary Structure • α-helix • Interaction between i and (i+4)th residue
1. Secondary Structure • β-sheet/strand • Parallel or Anti-parallel
1. Secondary Structure • Coil (loop)
1. Protein Structure • Tertiary Structure
1. Protein Structure • Super-Secondary (2.5) Structure Super-Secondary (2.5) Structure
1. Protein Structure • Quaternary Structure Super-Secondary (2.5) Structure
2. PROBLEM • Introduction • Problem * • Methods (4) • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal
2. Secondary Structure • Problem • Given: • A primary sequence of amino acids • a1a2…an • Find: • Secondary structure of each ai as • α-helix = H • β-strand = E * • coil = C
2. Secondary Structure • Example • Given: • Primary Sequence • GHWIATRGQLIREAYEDYRHFSSECPFIP • Find: • Secondary Structure Element • CEEEEECHHHHHHHHHHHCCCHHCCCCCC • Note: segments
2. Prediction Quality • Three-state prediction accuracy • Q3 = # of correctly predicted residues total # of number of residues • Q, Qβ, Qc • Q3 for random prediction is 33% • Theoretical limit Q3=90%.
2. Prediction Quality • Segment Overlap (SOV) • Higher penalties for core segment regions • Matthews Correlation Coefficients (MCC) • Prediction errors made for each state
2. True Structures • Three dimensional PDB data • DSSP (Dictionary of Secondary Structure of Proteins) • 8 states • H = alpha helix H • G = 310 - helix H • I = 5 helix (pi helix) H • E = extended strand (beta ladder) E • B = residue in isolated beta-bridge E • T = hydrogen bonded turn C • S = bend C • C = coil C • STRIDE
3. METHODS • Introduction • Problem • Methods (4) * • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal
3. Sliding Window • Sliding-Window
3. Sliding Window • Sliding-Window
3. Sliding Window • Sliding-Window
3. Sliding Window • Sliding-Window
3. Four Methods • Statistical Method • Neural Network • Support Vector Machine • Hidden Markov Model
3a. Statistical Method • Propensity • Ex. Chou-Fasman 50~53%
3b. Neural Network • Ex. PHD 71%
3c. SVM • Ex. PSIPRED 76~78%
3d. HMM Definition • State set Q • Output alphabet Σ
3d. HMM Definition • Transition probabilities • probability of entering the state p from state q • Tq(p) • q Q • p Q
3d. HMM Definition • Emission probabilities • probability emits each letter of Σ from state q • Eq(ai) • ai Σ • q Q
3d. HMM Decoding • Problem • Given: • HMM = (Q,Σ,E,T) and • Sequence S • Where S = S1, S2, …, Sn • Find: • Most probable path of state gone through to get S • Where X = X1, X2, …, Xn = state sequence
4. HMM Decoding • Optimize • Pr [ S , X ] • X = X1, X2, …, Xn = state sequence • S = S1, S2, …, Sn • Pr [ S | X ]
4. HMM Decoding • Dynamic programming • Memoryless • Pr [Xn|Sn] = Pr [Xn-1|Sn-1] Tn-1[Xn] EXn[Sn]
4. HMM EXAMPLES • Introduction • Problem • Methods (4) • HMM Examples (3) * • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal
4a. SEMI-HMM • Introduction • Problem • Methods (4) • HMM Examples (3) • Semi-HMM * • Profile HMM • Conditional Random Field • Proposal
4a. Semi-HMM • Definition • Each state can emit a sequence • Move emission probabilities into states • Model secondary structure segments
4a. Segmentation • Sequence Segments
4a. Segmentation • Sequence Segments
4a. Segmentation • Sequence Segments • T = secondary structural type of the segment, {H, E, L} • S = ends of each individual structural segments • R = known amino acid sequence
4a. Segmentation • Sequence Segments • T2 = E = β-strand • S2 = 9 • R2 = S1 + 1 : S2
4a. Bayesian • Bayesian Formulation • R = Sequence of ALL amino acid residues • S = End of the segments • T = Secondary structural type of the segments • {H, E, L}
4a. Bayesian • Bayesian Formulation • Likelihood • Priori Probability • Constant (S,T) dropped
4a. Bayesian • Likelihood • m = Total number of segments • Sj = End of the jth segments • Tj = Secondary structural type of the jth segments
4a. Bayesian • Likelihood
4a. Bayesian • Likelihood
4a. Bayesian • Likelihood N-terminus Internal C-terminus
4a. BSPPS • Bayesian Segmentation PPS