220 likes | 236 Views
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis. Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman Carnegie Mellon University.
E N D
TMpro:Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman Carnegie Mellon University 6th International Conference on Bioinformatics, Hong Kong, PR China,August 29th, 2007
Outline • Introduction • Membrane proteins • Transmembrane helix prediction • Previous methods • Drawbacks • Amino acid properties • Approach • Algorithm • Features and models • Evaluations • Web server Introduction Previous Methods Properties Approach Algorithm Web Server
Membrane Proteins Embedded in the cell / organelle membrane Membrane Protein • Important class of proteins • Many important functions carried out by them • Provide access to cell for drug targeting Cell Membrane Soluble Protein Introduction Previous Methods Properties Approach Algorithm Web Server
Transmembrane Segment Characteristics Cytoplasm (Aqueous medium) Transmembrane 30Å hydrophobic core A helix has to be 19 residues long to go from one side to the other Extracellular (Aqueous medium) Side view • Questions to be addressed by prediction algorithm • How many transmembrane segments are there? • Where are the transmembrane locations in primary sequence? Introduction Previous Methods Properties Approach Algorithm Web Server
Transmembrane Helix Prediction • Important • protein family • structure and function • regions accessible from extracellular side • Challenges • Little available training data • Overtraining • Difficulty in discovery of novel architectures Introduction Previous Methods Properties Approach Algorithm Web Server
Hydrophobicity scale Kyte-Doolittle hydrophobicity profile KD scale, GES scale, WW scale… 9 residue window average hydrophobicity Limitations: segment boundary unclear & low accuracy Introduction Previous Methods Properties Approach Algorithm Web Server
Current best methods use HMMs Hidden Markov Model Methods (TMHMM) Potassium channel actual predicted Limitations: too many parameters & restrictive topology Introduction Previous Methods Properties Approach Algorithm Web Server
TMpro: property based algorithm for transmembrane helix prediction
Opportunities for Improvement Amino acid properties Previous methods: • Do not employ all possible property distributions • Find average occurrences of amino acids Nonpolar residues Aromatic Residues Charged Residues Introduction Previous Methods Properties Approach Algorithm Web Server
Properties We Studied Introduction Previous Methods Properties Approach Algorithm Web Server
Modified Representation of Primary Sequence Amino Acid Property Sequences Charge Polarity Aromaticity Size Electronic properties Introduction Previous Methods Properties Approach Algorithm Web Server
Predictive Capability of Each Property • Adjust parameters of TMHMM (v 1.0): • To make it emit one of the property values • Properties considered • Polarity : polar, non-polar • Aromaticity: aromatic, aliphatic, neutral • Electronic properties: strong donor, weak donor, neutral, weak acceptor, strong acceptor 3-valued property observations achieve 91% accuracy of that of 20-valued amino acid observation Introduction Previous Methods Properties Approach Algorithm Web Server
Approach Biology: Biology: Language: Language: Raw text stored in Raw text stored in Multiple genome Multiple genome databases, libraries, databases, libraries, sequences sequences websites websites Mapping Mapping Expression, folding, Expression, folding, Meaning of words, Meaning of words, structure, function and structure, function and sentences, phrases, sentences, phrases, activity of proteins activity of proteins paragraphs paragraphs Extraction Extraction Decoding Decoding Understand complex Understand complex Knowledge about a Knowledge about a biological systems biological systems topic topic Retrieval Retrieval Summarization Summarization Translation Translation Biology-Language Analogy Ganapathiraju, et al (2004) LNCS 3345 Introduction Previous Methods Properties Approach Algorithm Web Server
Text Domain Equivalent Documents and Words Words: Property-values Documents: 15-residue windows VQLAHHFSEPEITLIIFGVMAGVIGTILLISYGIRRLIKK ----ppn-n-n---- -p--pp-p----p-- -.-.RRR....-.-- OOO.OOO.O.OOoOO • W1: positively charged • W2: polar • W3: nonpolar • W4: aromatic • W5: aliphatic • W6: strong electron acceptor • W7: strong electron donor • W8: weak electron acceptor • W9: weak electron donor • W10: medium sized Introduction Previous Methods Properties Approach Algorithm Web Server
Latent Semantic Analysis Build Word-Document Matrix Documents Distinct features of TM and nonTM achieved Words Dimension 2 W = USVT For classification feature vectors SVT can be used Dimension 1 Reduced dimensions: 4 Introduction Previous Methods Properties Approach Algorithm Web Server
Support vector machines Neural networks Linear classifier Hidden Markov modeling Decision trees Different Classifiers/Models Neural network with LSA features is called TMpro Introduction Previous Methods Properties Approach Algorithm Web Server
Evaluations Uses evolutionary information and many more model parameters Benchmark Server Resultshttp://cubic.bioc.columbia.edu/services/tmh_benchmark/ Evaluation on larger datasets Introduction Previous Methods Properties Approach Algorithm Web Server
TMpro Web Interface http://linzer.blm.cs.cmu.edu/tmpro/ Novel features for manual annotation Introduction Previous Methods Properties Approach Algorithm Web Server
Acknowledgements Co-authors: Judith Klein-Seetharaman Raj Reddy N. Balakrishnan Web-site Development: Christopher Jon Jursa Hassan A. Karimi Introduction Previous Methods Properties Approach Algorithm Web Server
Larger training data does not improve TMHMM STMHMM is TMHMM trained with recent 145 TM proteins Introduction Previous Methods Properties Approach Algorithm Web Server
Performance on Recent Large Dataset Introduction Previous Methods Properties Approach Algorithm Web Server