430 likes | 1.04k Views
Transmembrane Protein Prediction. Project Presentation CMPUT 606. Overview. Transmembrane (TM) protein: Associated with the plasma membrane “A protein that has domains exposed on both sides of the membrane” [Genes VII]
E N D
Transmembrane Protein Prediction Project Presentation CMPUT 606
Overview • Transmembrane (TM) protein: • Associated with the plasma membrane • “A protein that has domains exposed on both sides of the membrane” [Genes VII] • Some of the TM proteins that span the lipid layer several times form a hydrophilic channel that permits various ions and molecules to circulate through the plasma membrane.
Predictors • ePST • bPST • TMHMM • TMpred • HMMTOP • HMMer • TMDET
TMHMM • Short form prediction • sp_1xqe_A len=418 ExpAA=243.54 First60=39.67 PredHel=11 • Topology=o10-32i45-67o98-120i127-149o159-181i193-215o225-247i259-281o285-302i315-337o352-374i
Scores for complete sequences (score includes all domains): Sequence Description Score E-value N -------- ----------- ----- ------- --- nontm|1ALO._ OXIDOREDUCTASE -20.6 4.7 1 nontm|1CDE._ TRANSFERASE(FORMYL) -26.1 9.9 1 nontm|1AKO._ NUCLEASE -27.4 10 1 nontm|1ARU._ PEROXIDASE -37.1 10 1 sp|1pv7_A -41.7 10 1 sp|1pw4_A -46.0 10 1 sp|1pxs_A -48.9 10 1 sp|1xqe_A -49.0 10 1 sp|1r2c_L -53.2 10 1 nontm|1HSB.B HISTOCOMPATIBILITY -61.4 10 1 Parsed for domains: Sequence Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- nontm|1ALO._ 1/1 125 323 .. 1 199 [] -20.6 4.7 nontm|1CDE._ 1/1 4 202 .. 1 199 [] -26.1 9.9 nontm|1AKO._ 1/1 5 202 .. 1 199 [] -27.4 10 nontm|1ARU._ 1/1 112 295 .. 1 199 [] -37.1 10 sp|1pv7_A 1/1 116 314 .. 1 199 [] -41.7 10 sp|1pw4_A 1/1 162 329 .. 1 199 [] -46.0 10 sp|1pxs_A 1/1 51 249 .] 1 199 [] -48.9 10 sp|1xqe_A 1/1 39 226 .. 1 199 [] -49.0 10 sp|1r2c_L 1/1 62 260 .. 1 199 [] -53.2 10 nontm|1HSB.B 1/1 2 99 .] 1 199 [] -61.4 10 HMMer
HMMer Total sequences searched: 10 Whole sequence top hits: tophits_s report: Total hits: 10 Satisfying E cutoff: 9 Total memory: 16K Domain top hits: tophits_s report: Total hits: 10 Satisfying E cutoff: 10 Total memory: 22K
ePST Output TM# Start End 1 12 24 2 50 61 3 101 112 4 130 142 5 163 166 6 168 175 7 199 201 8 203 211 9 228 240 10 260 271 11 287 297 12 315 333 13 353 365 Total # ePST segments = 13
s# i char pos neg odds tot win maxwin region s 0 A -1.87 -708.40 706.52 706.52 706.52 0.00 - s 1 P -2.96 -708.40 705.44 1411.96 1411.96 0.00 - s 2 A -1.87 -708.40 706.52 2118.48 2118.48 0.00 - s 3 V -0.75 -708.40 707.64 2826.13 2826.13 0.00 - s 4 A -1.80 -708.40 706.60 3532.72 3532.72 0.00 - s 5 D -6.47 -708.40 701.92 4234.65 4234.65 0.00 - s 6 K -3.53 -708.40 704.87 4939.52 4939.52 0.00 - s 7 A -3.40 -708.40 705.00 5644.51 5644.51 0.00 - s 8 D -6.47 -708.40 701.92 6346.43 6346.43 0.00 - s 9 N -5.22 -708.40 703.18 7049.61 7049.61 0.00 - s 10 A -1.87 -708.40 706.52 7756.14 7756.14 0.00 - s 11 F -3.91 -708.40 704.49 8460.63 8460.63 0.00 - s 12 M -3.76 -708.40 704.63 9165.26 9165.26 0.00 - s 13 M -3.76 -708.40 704.63 9869.89 9869.89 0.00 - s 14 I -2.06 -708.40 706.34 10576.23 10576.23 0.00 - s 15 C -4.54 -708.40 703.86 11280.08 10573.56 10573.56 - s 16 T -2.71 -708.40 705.69 11985.77 10573.81 10573.81 - s 17 A -2.48 -708.40 705.91 12691.68 10573.20 10573.81 - s 18 L -4.01 -708.40 704.38 13396.07 10569.94 10573.81 - s 19 V -1.29 -708.40 707.11 14103.18 10570.45 10573.81 - s 20 L -0.59 -708.40 707.81 14810.99 10576.34 10576.34 - s 21 F -1.12 -708.40 707.28 15518.26 10578.75 10578.75 + s 22 M -3.76 -708.40 704.63 16222.90 10578.39 10578.75 + s 23 T -3.12 -708.40 705.27 16928.17 10581.74 10581.74 + s 24 I -0.87 -708.40 707.52 17635.69 10586.08 10586.08 + s 25 P -0.51 -708.40 707.89 18343.58 10587.44 10587.44 + s 26 G -2.25 -708.40 706.15 19049.73 10589.11 10589.11 + s 27 I -1.49 -708.40 706.91 19756.64 10591.38 10591.38 + s 28 A -1.54 -708.40 706.85 20463.50 10593.61 10593.61 + s 29 L -4.01 -708.40 704.38 21167.88 10591.65 10593.61 + s 30 F -1.92 -708.40 706.48 21874.36 10594.27 10594.27 + s 31 Y -6.07 -708.40 702.33 22576.69 10590.91 10594.27 + s 32 G -2.25 -708.40 706.15 23282.84 10591.15 10594.27 + s 33 G -4.38 -708.40 704.02 23986.86 10590.79 10594.27 + s 34 L -1.54 -708.40 706.85 24693.71 10590.53 10594.27 + s 35 I -2.06 -708.40 706.34 25400.05 10589.06 10594.27 + s 36 R -2.75 -708.40 705.65 26105.70 10587.43 10594.27 + s 37 G -2.25 -708.40 706.15 26811.85 10588.95 10594.27 + ePST Output
Training Set ePST ePST Prediction Post-processing Scripts TM# Start End 1 12 24 2 50 61 3 101 112 4 130 142 5 163 166 6 168 175 7 199 201 8 203 211 9 228 240 10 260 271 11 287 297 12 315 333 13 353 365 Total # segments predicted by ePST = 13 Testing Set ePST Execution Flow
Scanning PDB • Training: DMTMR40672 • Testing: PDB • Threshold 705.37->Nrtm=1665 chains • PDB_TM retrieves 1673 chains • Validation necessary – lack of ground truth
TMH Benchmark • tmeval.fasta: 2247 non-annotated sequences • Script for converting ePST output to TMH submit format • Comparison with other predictors • 4 tables • 8 evaluation parameters
Conclusions • ePST competitive predictor • Fast training • Scales well in contrast with HMMs • ePST does not suffer from a poor local minimum as HMMs • ePST does not require MSA of the sequences • ePST allows more than one test sequence at a time
Future Work • More tuning, use pruning • Applications to other tasks (phosphorylation) involved in signal transduction pathways • Search for a verified data set for training and testing (no consensus in the literature) • Extract features from the sequence • Analyze the false negatives with particular helix topologies (such as 1orq)