1 / 43

Prediction of protein localization and membrane protein topology

Prediction of protein localization and membrane protein topology. Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University. Stockholm Bioinformatics Center. www.sbc.su.se. sorting. Protein localization.

aurek
Download Presentation

Prediction of protein localization and membrane protein topology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University

  2. Stockholm Bioinformatics Center www.sbc.su.se sorting

  3. Protein localization

  4. Protein sorting in a eukaryotic cell SP

  5. The ’canonical’ signal peptide n-region: positively charged h-region: hydrophobic c-region: more polar, small residues in -1, -3 mTP

  6. mTPs are rich in R & K and can form amphiphilic helices(Abe et al., Cell 100:551) mTP bound to Tom20 cTP

  7. Typical chloroplast transit peptide ANN

  8. output layer input layer A simple artificial neural network (ANN) Inside ANN

  9. Artificial neural networks:a summary - a high-quality dataset (positive and negative examples) - an ANN architecture (can be optimized) - all internal parameters in the ANN are systematically optimized during a training session - evaluate the predictive performance using cross- validation ChloroP

  10. ChloroP(Prot.Sci. 8:978) TargetP

  11. TargetP - a four-state SP/mTP/cTP/other predictor(JMB 300:1105) performance

  12. TargetP sensitivity/specificity sens spec SP .91 .96 mTP .82 .90 cTP .85 .69 other .85 .78 sens = tp/(tp+fn) spec = tp/(tp+fp) Other predictors

  13. Other ways to predict localization - amino acid composition - sequence homology - domain structure - phylogenetic profiles - expression profiles Membrane proteins

  14. Popular prediction programs SignalP (NN, HMM) ChloroP TargetP LipoP ------- MitoProt PSORT www.cbs.dtu.dk Membrane proteins

  15. Membrane protein topology

  16. A simulated lipid bilayer(Grubmüller et al.)

  17. Helix bundle Only two basic structures(Quart.Rev.Biophys. 32:285) ß-barrel Lipid/prot interactions

  18. Most MPs are synthesized at the ER SP

  19. The basic model(courtesy Bill Skach) prediction

  20. Topology prediction

  21. TM helix lengths are typically 20-30 residues(Bowie, JMB 272:780) Trp, Tyr

  22. Trp & Tyr are enriched in the region near the lipid headgroups(Prot.Sci. 6:808; 7:2026) Loop lengths

  23. Loops tend to be short(Tusnady & Simon, JMB 283:489) PI rule

  24. The ’positive inside’ rule(EMBO J. 5:3021; EJB 174:671, 205:1207; FEBS Lett. 282:41) Bacterial IM in: 16% KR out: 4% KR Eukaryotic PM in: 17% KR out: 7% KR Thylakoid membrane in: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR out in prediction

  25. number of genomes amino acid The positive-inside rule applies to all organisms(Nilsson, Persson & von Heijne, submitted)

  26. 0+ 0+ 4+ Topology can be manipulated(Nature 341:456) PK 2+ 2+ 10+ Lep constructs expressed in E. coli

  27. Topology prediction - a classical problem in bioinformatics 4 characteristics

  28. Trp, Tyr Three important characteristics ~20 hydrophobic residues ’Positive inside’ rule predictors

  29. Popular topology predictors TMHMM (HMM) HMMTOP (HMM) TopPred (h-plot + PI-rule) MEMSAT (dynamic programming) TMAP (h-plot, mult. alignment) PHD (NN, mult. alignment) toppred

  30. - construct all possible topologies - rank based on D+ TopPred(JMB 225:487) E. coli LacY http://bioweb.pasteur.fr/ seqanal/interfaces/ toppred.html TMHMM

  31. TMHMM(Sonnhammer et al., ISMB 6:175, Krogh et al., JMB 305:567) A hidden Markov model-based method www.cbs.dtu.dk www.sbc.su.se h & l models

  32. HMMTOP(Tusnady & Simon, JMB 283:489) performance

  33. Helix & loop models in TMHMM HMMTOP

  34. TMHMM performance(Krogh et al., JMB 305:567; Melén et al. JMB 327:735) Discrimination globular/membrane: sens & spec > 98% Correct topology: 55-60% Single TM identification: sensitivity: 96% specificity: 98% Training set: 160 membrane proteins 650 globular proteins # of TM proteins

  35. Can performance be improved? Consensus predictions Multiple alignments Experimental constraints # of TM proteins

  36. ’Consensus’ predictions indicate reliability(FEBS Lett. 486:267) 60 E. coli proteins 5 prediction methods used 46% of 764 predicted E. coli IM proteins are in the 5/0 or 4/1 classes fraction correct/coverage majority level Partial consensus

  37. Sequence: M C Y G K C I p(i): 0.78 0.78 0.78 0.76 0.76 0.08 0.03 p(h): 0.00 0.00 0.02 0.02 0.15 0.85 0.93 p(o): 0.22 0.22 0.20 0.20 0.08 0.07 0.04 Label: i i i i i h h TMHMM reliability scores(Melén et al. JMB 327:735) TMHMM output: 1. Mean probability pmean 2. Minimum probability pmin(label) 3. PbestPath/PallPaths S3 results

  38. TMHMM (score 3) Prediction accuracy vs. coverage 92 bacterial proteins percent correct ~45% ~70% coverage Test set bias

  39. percent 0-0.25 0.25-0.5 0.5-0.75 0.75-1 score interval ”Experimentally known topologies” is a biased sample Estimate true performance

  40. Correlation between accuracy and TMHMM S3 score percent correct mean score genomes

  41. Expected TMHMM performance on proteomes test set percent correct C. elegans E. coli S. cerevisiae coverage Add C-term.

  42. Original TMHMM prediction, one TM helix missing TMHMM prediction with C-terminus fixed to inside Experimental information helps(JMB 327:735) improvement

  43. Experimental information helps(JMB 327:735) When the location of the C-terminus is known, the correct topology is predicted for an estimated ~70% of all membrane proteins (~ 55% when not known) Reporter fusions

More Related