1 / 14

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent . Alpha-helical Transmembrane Proteins. Transmembrane proteins fulfil many critical cellular functions. Comprise about 30% of the human proteome.

cade
Download Presentation

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Support Vector Machines for transmembrane protein topology predictionTim Nugent

  2. Alpha-helical Transmembrane Proteins • Transmembrane proteins fulfil many critical cellular functions. • Comprise about 30% of the human proteome. • Composed of hydrophobic, membrane-spanning alpha-helices, connected with loop regions. • Poorly represented in structural databases. • Predicting their structure and topology is therefore an important challenge for bioinformatics.

  3. Transmembrane Protein Topology • Topology of a transmembrane protein describes which portions of the amino-acid sequence lie within the plane of the surrounding lipid bilayer and which portions protrude into the watery environment on either side. • Regions of the polypeptide chain span the membrane. • Position of the N-terminal.

  4. Identification of Transmembrane Regions Aquaporin KGVWTQAFWKAVTAEFLAMLIFVLLSVGSTINWGGSEN To generate data for a plot, the protein sequence is scanned with a moving window of size 19-21 residues. At each position, the mean hydrophobic index of the amino acids within the window is calculated and that value plotted as the midpoint of the window.

  5. Discriminating between Inside and Outside Loops Hydrophobic: Val, Phe, Ile, Leu, Met. Positive: Lys, Arg, His. Cytoplasmic loops are enriched in positively charged residues: the 'positive-inside rule' of von Heijne

  6. Using Evolutionary Information -190 -486 -409 -225 -483 223 -414 -327 -229 -389 -83 738 -236 -56 -424 -478 -100 -370 -32768 -40 -506 218 -282 -521 159 -410 410 155 -513 -225 -311 -354 -163 106 137 50 -100 -325 -32768 -403 • PSI-BLAST takes a single protein sequence as an input and compares it to a protein database. • The program constructs a multiple alignment, and then a profile, from any significant local alignments found. • The profile is compared to the protein database, again seeking local alignments. • PSI-BLAST estimates the statistical significance of the local alignments found. • Finally, PSI-BLAST iterates, by returning to step (2), an arbitrary number of times or until convergence.

  7. Using Support Vector Machines for Topology prediction • Earlier approaches have relied on physiochemical properties such as hydrophobicity to identify transmembrane helices (e.g Kyte-Doolittle). • Recently, more advanced methods using machine learning algorithms such as hidden Markov models (e.g. TMHMM, PHOBIUS) and neural networks (MEMSAT3) have been developed, • They have achieved significant improvements in prediction accuracy (~80%). • However, none of the top scoring methods use SVMs. • While hidden Markov models and neural networks may have multiple outputs, SVMs are binary classifiers. • In order to deal with TM topology prediction, multiple SVM will have to be combined, e.g. • TM helix / Loop • Inside Loop / Outside Loop • Signal Peptide / TM helix • Re-entrant Loop / TM helix

  8. Helix / Loop SVM Prediction Accuracy • TM helix / Loop SVM: • Database of 135 non-redundant protein sequences • Jack knife cross-validation • PSI-BLAST profiles • Normalised by Z-score • 33 residue sliding window • Radial Basis Function Kernel: Gamma = 0.09, C = 0.8 • SVM Mathews Correlation Coefficient = 0.82 • TP=9129 • FP=1351 • TN=22140 • FN=1320 • Kyte-Doolittle MCC: 0.66 • MEMSAT3 MMC: 0.76

  9. Inside Loop/Outside Loop SVM Prediction Accuracy • Inside Loop/Outside Loop SVM • 33 residue sliding window • Mathews Correlation Coefficient = 0.64 • Precision = 0.86 • Recall = 0.59 • Signal Peptide/TM Helix and Re-entrant Loop/TM Helix SVMs in training...

  10. SVM Results – Glycerol uptake facilitator

  11. SVM Results – Photosystem II subunit A

  12. SVM Results – Particulate Methane Monooxygenase subunit C

  13. SVM Results – Cytochrome b6f subunit A

  14. Further work • Expand training set. • Additional sequences where the TMH are known but the topology is not can be used to train the Helix/Loop classifier. • Parameter optimisation. • Window size • Kernel type • Transduction. • Signal peptide SVM • Re-entrant loop SVM. • Combine SVM raw scores/probabilities into a topology.

More Related