Prediction of protein localization and membrane protein topology

Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University

Stockholm Bioinformatics Center www.sbc.su.se sorting

Protein localization

Protein sorting in a eukaryotic cell SP

The ’canonical’ signal peptide n-region: positively charged h-region: hydrophobic c-region: more polar, small residues in -1, -3 mTP

mTPs are rich in R & K and can form amphiphilic helices(Abe et al., Cell 100:551) mTP bound to Tom20 cTP

Typical chloroplast transit peptide ANN

output layer input layer A simple artificial neural network (ANN) Inside ANN

Artificial neural networks:a summary - a high-quality dataset (positive and negative examples) - an ANN architecture (can be optimized) - all internal parameters in the ANN are systematically optimized during a training session - evaluate the predictive performance using cross- validation ChloroP

ChloroP(Prot.Sci. 8:978) TargetP

TargetP - a four-state SP/mTP/cTP/other predictor(JMB 300:1105) performance

TargetP sensitivity/specificity sens spec SP .91 .96 mTP .82 .90 cTP .85 .69 other .85 .78 sens = tp/(tp+fn) spec = tp/(tp+fp) Other predictors

Other ways to predict localization - amino acid composition - sequence homology - domain structure - phylogenetic profiles - expression profiles Membrane proteins

Popular prediction programs SignalP (NN, HMM) ChloroP TargetP LipoP ------- MitoProt PSORT www.cbs.dtu.dk Membrane proteins

Membrane protein topology

A simulated lipid bilayer(Grubmüller et al.)

Helix bundle Only two basic structures(Quart.Rev.Biophys. 32:285) ß-barrel Lipid/prot interactions

Most MPs are synthesized at the ER SP

The basic model(courtesy Bill Skach) prediction

Topology prediction

TM helix lengths are typically 20-30 residues(Bowie, JMB 272:780) Trp, Tyr

Trp & Tyr are enriched in the region near the lipid headgroups(Prot.Sci. 6:808; 7:2026) Loop lengths

Loops tend to be short(Tusnady & Simon, JMB 283:489) PI rule

The ’positive inside’ rule(EMBO J. 5:3021; EJB 174:671, 205:1207; FEBS Lett. 282:41) Bacterial IM in: 16% KR out: 4% KR Eukaryotic PM in: 17% KR out: 7% KR Thylakoid membrane in: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR out in prediction

number of genomes amino acid The positive-inside rule applies to all organisms(Nilsson, Persson & von Heijne, submitted)

0+ 0+ 4+ Topology can be manipulated(Nature 341:456) PK 2+ 2+ 10+ Lep constructs expressed in E. coli

Topology prediction - a classical problem in bioinformatics 4 characteristics

Trp, Tyr Three important characteristics ~20 hydrophobic residues ’Positive inside’ rule predictors

Popular topology predictors TMHMM (HMM) HMMTOP (HMM) TopPred (h-plot + PI-rule) MEMSAT (dynamic programming) TMAP (h-plot, mult. alignment) PHD (NN, mult. alignment) toppred

- construct all possible topologies - rank based on D+ TopPred(JMB 225:487) E. coli LacY http://bioweb.pasteur.fr/ seqanal/interfaces/ toppred.html TMHMM

TMHMM(Sonnhammer et al., ISMB 6:175, Krogh et al., JMB 305:567) A hidden Markov model-based method www.cbs.dtu.dk www.sbc.su.se h & l models

HMMTOP(Tusnady & Simon, JMB 283:489) performance

Helix & loop models in TMHMM HMMTOP

TMHMM performance(Krogh et al., JMB 305:567; Melén et al. JMB 327:735) Discrimination globular/membrane: sens & spec > 98% Correct topology: 55-60% Single TM identification: sensitivity: 96% specificity: 98% Training set: 160 membrane proteins 650 globular proteins # of TM proteins

Can performance be improved? Consensus predictions Multiple alignments Experimental constraints # of TM proteins

’Consensus’ predictions indicate reliability(FEBS Lett. 486:267) 60 E. coli proteins 5 prediction methods used 46% of 764 predicted E. coli IM proteins are in the 5/0 or 4/1 classes fraction correct/coverage majority level Partial consensus

Sequence: M C Y G K C I p(i): 0.78 0.78 0.78 0.76 0.76 0.08 0.03 p(h): 0.00 0.00 0.02 0.02 0.15 0.85 0.93 p(o): 0.22 0.22 0.20 0.20 0.08 0.07 0.04 Label: i i i i i h h TMHMM reliability scores(Melén et al. JMB 327:735) TMHMM output: 1. Mean probability pmean 2. Minimum probability pmin(label) 3. PbestPath/PallPaths S3 results

TMHMM (score 3) Prediction accuracy vs. coverage 92 bacterial proteins percent correct ~45% ~70% coverage Test set bias

percent 0-0.25 0.25-0.5 0.5-0.75 0.75-1 score interval ”Experimentally known topologies” is a biased sample Estimate true performance

Correlation between accuracy and TMHMM S3 score percent correct mean score genomes

Expected TMHMM performance on proteomes test set percent correct C. elegans E. coli S. cerevisiae coverage Add C-term.

Original TMHMM prediction, one TM helix missing TMHMM prediction with C-terminus fixed to inside Experimental information helps(JMB 327:735) improvement

Experimental information helps(JMB 327:735) When the location of the C-terminus is known, the correct topology is predicted for an estimated ~70% of all membrane proteins (~ 55% when not known) Reporter fusions

Prediction of protein localization and membrane protein topology

Prediction of protein localization and membrane protein topology

Presentation Transcript

Protein structure prediction

Membrane Protein Structure Prediction and Visualization

Prediction of protein structure

Protein Localization

Protein-membrane association.

Protein Structure Prediction

Membrane Protein Insertion

Protein structure prediction

Protein Structure and Prediction

Protein Structure Prediction

Protein Structure Prediction

Prediction of protein structure

LOCtree: prediction of protein subcellular localization

Protein Folding Protein Structure Prediction Protein Design

Prediction of protein disorder

Protein structure prediction

Prediction of protein function

Transmembrane Protein Prediction

PROTEIN TRAFFICKING AND LOCALIZATION

Prediction of protein disorder

PROTEIN LOCALIZATION and SECRETION

Membrane Protein Insertion