1 / 22

TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis

TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis. Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman Carnegie Mellon University.

garciagary
Download Presentation

TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TMpro:Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman Carnegie Mellon University 6th International Conference on Bioinformatics, Hong Kong, PR China,August 29th, 2007

  2. Outline • Introduction • Membrane proteins • Transmembrane helix prediction • Previous methods • Drawbacks • Amino acid properties • Approach • Algorithm • Features and models • Evaluations • Web server Introduction Previous Methods Properties Approach Algorithm Web Server

  3. Membrane Proteins Embedded in the cell / organelle membrane Membrane Protein • Important class of proteins • Many important functions carried out by them • Provide access to cell for drug targeting Cell Membrane Soluble Protein Introduction Previous Methods Properties Approach Algorithm Web Server

  4. Transmembrane Segment Characteristics Cytoplasm (Aqueous medium) Transmembrane 30Å hydrophobic core A helix has to be 19 residues long to go from one side to the other Extracellular (Aqueous medium) Side view • Questions to be addressed by prediction algorithm • How many transmembrane segments are there? • Where are the transmembrane locations in primary sequence? Introduction Previous Methods Properties Approach Algorithm Web Server

  5. Transmembrane Helix Prediction • Important • protein family • structure and function • regions accessible from extracellular side • Challenges • Little available training data • Overtraining • Difficulty in discovery of novel architectures Introduction Previous Methods Properties Approach Algorithm Web Server

  6. Hydrophobicity scale Kyte-Doolittle hydrophobicity profile KD scale, GES scale, WW scale… 9 residue window average hydrophobicity Limitations: segment boundary unclear & low accuracy Introduction Previous Methods Properties Approach Algorithm Web Server

  7. Current best methods use HMMs Hidden Markov Model Methods (TMHMM) Potassium channel actual predicted Limitations: too many parameters & restrictive topology Introduction Previous Methods Properties Approach Algorithm Web Server

  8. TMpro: property based algorithm for transmembrane helix prediction

  9. Opportunities for Improvement Amino acid properties Previous methods: • Do not employ all possible property distributions • Find average occurrences of amino acids Nonpolar residues Aromatic Residues Charged Residues Introduction Previous Methods Properties Approach Algorithm Web Server

  10. Properties We Studied Introduction Previous Methods Properties Approach Algorithm Web Server

  11. Modified Representation of Primary Sequence Amino Acid Property Sequences Charge Polarity Aromaticity Size Electronic properties Introduction Previous Methods Properties Approach Algorithm Web Server

  12. Predictive Capability of Each Property • Adjust parameters of TMHMM (v 1.0): • To make it emit one of the property values • Properties considered • Polarity : polar, non-polar • Aromaticity: aromatic, aliphatic, neutral • Electronic properties: strong donor, weak donor, neutral, weak acceptor, strong acceptor 3-valued property observations achieve 91% accuracy of that of 20-valued amino acid observation Introduction Previous Methods Properties Approach Algorithm Web Server

  13. Approach Biology: Biology: Language: Language: Raw text stored in Raw text stored in Multiple genome Multiple genome databases, libraries, databases, libraries, sequences sequences websites websites Mapping Mapping Expression, folding, Expression, folding, Meaning of words, Meaning of words, structure, function and structure, function and sentences, phrases, sentences, phrases, activity of proteins activity of proteins paragraphs paragraphs Extraction Extraction Decoding Decoding Understand complex Understand complex Knowledge about a Knowledge about a biological systems biological systems topic topic Retrieval Retrieval Summarization Summarization Translation Translation Biology-Language Analogy Ganapathiraju, et al (2004) LNCS 3345 Introduction Previous Methods Properties Approach Algorithm Web Server

  14. Text Domain Equivalent Documents and Words Words: Property-values Documents: 15-residue windows VQLAHHFSEPEITLIIFGVMAGVIGTILLISYGIRRLIKK ----ppn-n-n---- -p--pp-p----p-- -.-.RRR....-.-- OOO.OOO.O.OOoOO • W1: positively charged • W2: polar • W3: nonpolar • W4: aromatic • W5: aliphatic • W6: strong electron acceptor • W7: strong electron donor • W8: weak electron acceptor • W9: weak electron donor • W10: medium sized Introduction Previous Methods Properties Approach Algorithm Web Server

  15. Latent Semantic Analysis Build Word-Document Matrix Documents Distinct features of TM and nonTM achieved Words Dimension 2 W = USVT For classification feature vectors SVT can be used Dimension 1 Reduced dimensions: 4 Introduction Previous Methods Properties Approach Algorithm Web Server

  16. Support vector machines Neural networks Linear classifier Hidden Markov modeling Decision trees Different Classifiers/Models Neural network with LSA features is called TMpro Introduction Previous Methods Properties Approach Algorithm Web Server

  17. Evaluations Uses evolutionary information and many more model parameters Benchmark Server Resultshttp://cubic.bioc.columbia.edu/services/tmh_benchmark/ Evaluation on larger datasets Introduction Previous Methods Properties Approach Algorithm Web Server

  18. TMpro Web Interface http://linzer.blm.cs.cmu.edu/tmpro/ Novel features for manual annotation Introduction Previous Methods Properties Approach Algorithm Web Server

  19. Acknowledgements Co-authors: Judith Klein-Seetharaman Raj Reddy N. Balakrishnan Web-site Development: Christopher Jon Jursa Hassan A. Karimi Introduction Previous Methods Properties Approach Algorithm Web Server

  20. Thank you!

  21. Larger training data does not improve TMHMM STMHMM is TMHMM trained with recent 145 TM proteins Introduction Previous Methods Properties Approach Algorithm Web Server

  22. Performance on Recent Large Dataset Introduction Previous Methods Properties Approach Algorithm Web Server

More Related