Localization prediction of transmembrane proteins

Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland

Protein Membrane Soluble Integral Peripheral Anchored Transmembrane -barrel -helical Multi-spanning Single-spanning Protein classes Maetschke et al, The University of Queensland

Transmembrane protein types Type-IV(multi-spanning) Type-I Type-II Type-III C N N signal peptide C C N Cytosol (inside) Maetschke et al, The University of Queensland

Eukaryotic cell Peroxisome Nucleus Mitochondrion RNA Ribosome Endoplasmic Reticulum ERGIC Lysosome Golgi Complex Endosome Maetschke et al, The University of Queensland

Secretory and endocytic pathway Maetschke et al, The University of Queensland

Problem and hypothesis • Sorting signals for transmembrane proteins serve multiple purposes (targeting, retention, retrieval, avoidance) and are largely unknown (the problem is challenging/multi-faceted) • Current localization prediction of eukaryotic transmembrane proteins is poor (models based on soluble proteins are ill-suited) (previous work is inadequate/incomplete) • Localization prediction for transmembrane proteins is virtually unexplored (paucity/variance of data) (it is an open problem) • Explicit modelling of protein topology should enhance localization prediction accuracy (parameter tuning receives explicit guidance to biologically sensible solutions) (the way to do it!) Maetschke et al, The University of Queensland

Inital state probabilities: a22 a33 a11 a12 a23 S1 S2 S3 • State transition probabilities: b3 b1 b2 A A A 1 1 1 R R R 2 2 2 ... ... ... V V V 20 20 20 Hidden Markov model • Observation sequence: • State sequence: s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 • Observation probabilities: Maetschke et al, The University of Queensland

Inital state probabilities: a22 a33 a11 a12 a23 S1 S2 S3 • State transition probabilities: b3 b1 b2 2-order Hidden Markov model • Observation sequence: • State sequence: s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 AA AA AA 1 1 1 AR AR AR 2 2 2 • Observation probabilities: AN AN AN 3 3 3 AD AD AD 4 4 4 ... ... ... VV VV VV 400 400 400 Maetschke et al, The University of Queensland

Inital state probabilities: a22 a33 a11 a12 a23 S1 S2 S3 • State transition probabilities: b3 b1 b2 3-order Hidden Markov model • Observation sequence: • State sequence: s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 AAA AAA AAA 1 1 1 AAR AAR AAR 2 2 2 • Observation probabilities: AAN AAN AAN 3 3 3 AAD AAD AAD 4 4 4 AAC AAC AAC 5 5 5 AAQ AAQ AAQ 6 6 6 ... ... ... VVV VVV VVV 8000 8000 8000 Maetschke et al, The University of Queensland

N-terminal region hydrophobic core cleavage region mature protein Signal peptide Maetschke et al, The University of Queensland

Transmembrane domain icap TMD ocap Maetschke et al, The University of Queensland

SP N-term ocap TMD icap C-term outside inside Protein topology model Maetschke et al, The University of Queensland

Peroxisome Nucleus Mitochondrion ERGIC Endoplasmic Reticulum Lysosome Golgi Complex Endosome Localization model (5 x topology models) Maetschke et al, The University of Queensland

LOCATE dataset Subset LOCATE database • FANTOM3, Mouse proteome • Filter for transmembrane proteins • No multi-targeted proteins • Redundancy reduced (<25%) • TMDs and SPs are labeled (predicted) • High quality localization annotation 873 Plasma Membrane 261 Endoplasmic Reticulum 141 Golgi Complex 45 Lysosome 31 Endosome 1351 Maetschke et al, The University of Queensland

Confusion Matrix HMM-2 Prediction performance Prediction Performance (MCC) • LOCATE dataset • Mean correlation coefficient • 10 fold, 10 times • Five locations (ER, PM, GO, EN, LY) • SVM: linear kernel • 1-, 2- and 3-order HMMs => Di-peptide composition superior to single amino acid composition => Topological model superior to non-topological model Maetschke et al, The University of Queensland

Predictor comparison Prediction accuracy in % • Test set (20 PM, 20 ER, 20 Golgi) • HMM: only three classes but test set  train set • Other predictors: more classes but test set  train set→ difficult to compare! CELLO 2.5:http://cello.life.nctu.edu.tw/WolfPSort:http://wolfpsort.seq.cbrc.jp/ ProteomeAnalyst 2.5:http://www.cs.ualberta.ca/~bioinfo/PA/Sub/ HMM-2:http://pprowler.itee.uq.edu.au/TMPHMMLoc Maetschke et al, The University of Queensland

Conclusion • Novel predictor for subcellular localization of transmembrane proteins along the secretory pathway: http://pprowler.itee.uq.edu.au/TMPHMMLoc • Protein model has less states than topology predictors (TMHMM, HMMTOP, etc) but is of second order • Localization model is trained and tested using LOCATE, a recent, high-quality localization dataset • Overall better performance than current localization predictors (transmembrane proteins, eukaryotic, secretory pathway) • Di-peptide composition superior to single amino acid composition • "Topological" model superior to "non-topological" baseline model Maetschke et al, The University of Queensland

Localization prediction of transmembrane proteins

Localization prediction of transmembrane proteins

Presentation Transcript

Chlamydial inclusion membrane proteins: localization and characterization

Active Transport, Transmembrane Proteins, and Neurons

Designing an SPR biointerface for transmembrane proteins

Co-Localization of Proteins – Standard Techniques and Biophysical Approaches

Transmembrane Protein Topology Prediction Using Support Vector Machines

Species Independent Protein Localization Prediction for Multi-compartmentalized Proteins

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent

Application of Stacked Generalization to a Protein Localization Prediction Task

Prediction of protein localization and membrane protein topology

Proteins, Proteins, Proteins!

LOCtree: prediction of protein subcellular localization

Grammatical inference for disulfide bonds prediction within proteins

Alpha-helical transmembrane protein structure prediction Timothy Nugent

Alpha-helical transmembrane protein fold prediction using residue contacts

Transmembrane Protein Prediction

Developing Novel Supported Membrane Interfaces for SPR Study of Transmembrane Proteins

Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent

Characterization and prediction of drug binding sites in proteins

Characterization of Transmembrane Helices

Prediction of Coordination Number and Relative Solvent Accessibility in Proteins

Research on prediction of transmembrane protein topology based on fuzzy theory

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins