200 likes | 348 Views
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bod én and Marcus Gallagher The University of Queensland. Protein. Membrane. Soluble. Integral. Peripheral. Anchored. Transmembrane. -barrel. -helical. Multi-spanning. Single-spanning. Protein classes.
E N D
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland
Protein Membrane Soluble Integral Peripheral Anchored Transmembrane -barrel -helical Multi-spanning Single-spanning Protein classes Maetschke et al, The University of Queensland
Transmembrane protein types Type-IV(multi-spanning) Type-I Type-II Type-III C N N signal peptide C C N Cytosol (inside) Maetschke et al, The University of Queensland
Eukaryotic cell Peroxisome Nucleus Mitochondrion RNA Ribosome Endoplasmic Reticulum ERGIC Lysosome Golgi Complex Endosome Maetschke et al, The University of Queensland
Secretory and endocytic pathway Maetschke et al, The University of Queensland
Problem and hypothesis • Sorting signals for transmembrane proteins serve multiple purposes (targeting, retention, retrieval, avoidance) and are largely unknown (the problem is challenging/multi-faceted) • Current localization prediction of eukaryotic transmembrane proteins is poor (models based on soluble proteins are ill-suited) (previous work is inadequate/incomplete) • Localization prediction for transmembrane proteins is virtually unexplored (paucity/variance of data) (it is an open problem) • Explicit modelling of protein topology should enhance localization prediction accuracy (parameter tuning receives explicit guidance to biologically sensible solutions) (the way to do it!) Maetschke et al, The University of Queensland
Inital state probabilities: a22 a33 a11 a12 a23 S1 S2 S3 • State transition probabilities: b3 b1 b2 A A A 1 1 1 R R R 2 2 2 ... ... ... V V V 20 20 20 Hidden Markov model • Observation sequence: • State sequence: s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 • Observation probabilities: Maetschke et al, The University of Queensland
Inital state probabilities: a22 a33 a11 a12 a23 S1 S2 S3 • State transition probabilities: b3 b1 b2 2-order Hidden Markov model • Observation sequence: • State sequence: s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 AA AA AA 1 1 1 AR AR AR 2 2 2 • Observation probabilities: AN AN AN 3 3 3 AD AD AD 4 4 4 ... ... ... VV VV VV 400 400 400 Maetschke et al, The University of Queensland
Inital state probabilities: a22 a33 a11 a12 a23 S1 S2 S3 • State transition probabilities: b3 b1 b2 3-order Hidden Markov model • Observation sequence: • State sequence: s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 AAA AAA AAA 1 1 1 AAR AAR AAR 2 2 2 • Observation probabilities: AAN AAN AAN 3 3 3 AAD AAD AAD 4 4 4 AAC AAC AAC 5 5 5 AAQ AAQ AAQ 6 6 6 ... ... ... VVV VVV VVV 8000 8000 8000 Maetschke et al, The University of Queensland
N-terminal region hydrophobic core cleavage region mature protein Signal peptide Maetschke et al, The University of Queensland
Transmembrane domain icap TMD ocap Maetschke et al, The University of Queensland
SP N-term ocap TMD icap C-term outside inside Protein topology model Maetschke et al, The University of Queensland
Peroxisome Nucleus Mitochondrion ERGIC Endoplasmic Reticulum Lysosome Golgi Complex Endosome Localization model (5 x topology models) Maetschke et al, The University of Queensland
LOCATE dataset Subset LOCATE database • FANTOM3, Mouse proteome • Filter for transmembrane proteins • No multi-targeted proteins • Redundancy reduced (<25%) • TMDs and SPs are labeled (predicted) • High quality localization annotation 873 Plasma Membrane 261 Endoplasmic Reticulum 141 Golgi Complex 45 Lysosome 31 Endosome 1351 Maetschke et al, The University of Queensland
Confusion Matrix HMM-2 Prediction performance Prediction Performance (MCC) • LOCATE dataset • Mean correlation coefficient • 10 fold, 10 times • Five locations (ER, PM, GO, EN, LY) • SVM: linear kernel • 1-, 2- and 3-order HMMs => Di-peptide composition superior to single amino acid composition => Topological model superior to non-topological model Maetschke et al, The University of Queensland
Predictor comparison Prediction accuracy in % • Test set (20 PM, 20 ER, 20 Golgi) • HMM: only three classes but test set train set • Other predictors: more classes but test set train set→ difficult to compare! CELLO 2.5:http://cello.life.nctu.edu.tw/WolfPSort:http://wolfpsort.seq.cbrc.jp/ ProteomeAnalyst 2.5:http://www.cs.ualberta.ca/~bioinfo/PA/Sub/ HMM-2:http://pprowler.itee.uq.edu.au/TMPHMMLoc Maetschke et al, The University of Queensland
Conclusion • Novel predictor for subcellular localization of transmembrane proteins along the secretory pathway: http://pprowler.itee.uq.edu.au/TMPHMMLoc • Protein model has less states than topology predictors (TMHMM, HMMTOP, etc) but is of second order • Localization model is trained and tested using LOCATE, a recent, high-quality localization dataset • Overall better performance than current localization predictors (transmembrane proteins, eukaryotic, secretory pathway) • Di-peptide composition superior to single amino acid composition • "Topological" model superior to "non-topological" baseline model Maetschke et al, The University of Queensland