180 likes | 372 Views
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent. Assignment of PROSITE motifs to topological regions.
E N D
Progress in Transmembrane Protein Research12 Month ReportTim Nugent
Assignment of PROSITE motifs to topological regions • We explored the possibility that motifs from the PROSITE database could be used as constraints in subsequent topology prediction steps, by identifying a bias in their inside/outside frequency. Extracelullar Cytoplasm
CLN3 Topology Prediction • Model is in agreement with all published experimental data. • Potential amphipathic helix. • Bias is hydrophobic/polar residue placement • 2 Arginine residues in close proximity – possible anion channel?
Using Support Vector Machines for Topology prediction • Earlier approaches have relied on physiochemical properties such as hydrophobicity to identify transmembrane helices (e.g Kyte-Doolittle). • Recently, more advanced methods using machine learning algorithms such as hidden Markov models (e.g. TMHMM, PHOBIUS) and neural networks (MEMSAT3) have been developed, • They have achieved significant improvements in prediction accuracy (~80%). • However, none of the top scoring methods use SVMs. • While hidden Markov models and neural networks may have multiple outputs, SVMs are binary classifiers. • In order to deal with TM topology prediction, multiple SVM will have to be combined, e.g. • TM helix / Loop • Inside Loop / Outside Loop • Signal Peptide / TM helix • Re-entrant Loop / TM helix
Helix / Loop SVM Prediction Accuracy • TM helix / Loop SVM: • PSI-BLAST profiles • Normalised by Z-score • 29 residue sliding window • 3rd order polynomial kernel function • Mathews Correlation Coefficient = 0.75 • Precision = 0.86 • Recall = 0.32 • TP= 8384 • FP= 1355 • TN= 17773 • FN= 1969 • Kyte-Doolittle MCC: 0.64 • MEMSAT3 MMC: 0.76 • Overlap of at least 37 sequences between Moller dataset and novel training set.
Inside Loop/Outside Loop SVM Prediction Accuracy • Inside Loop/Outside Loop SVM • 27 residue sliding window • Mathews Correlation Coefficient = 0.60 • Precision = 0.78 • Recall = 0.50 • TP= 4060 • FP=1028 • TN=4081 • FN=1007 • Signal Peptide/TM Helix and Re-entrant Loop/TM Helix SVMs in training!
Further work • Expand training set: ~45 sequences to add. • Additional sequences where the TMH are known but the topology is not can be used to train the Helix/Loop classifier. • Parameter optimisation. • Window size • Kernel type • Signal peptide SVM. • Re-entrant loop SVM. • Combine SVM raw scores/probabilities into a topology.