1 / 30

Protein sequence analysis

Protein sequence analysis. Xu Cheng. Knowing what you must about domains, HMMs, profiles, and the Pfam domain collectionVisiting the three most popular sites for finding domains in your protein Predicting simple physical properties of your sequences

nika
Download Presentation

Protein sequence analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein sequence analysis Xu Cheng

  2. Knowing what you must about domains, HMMs, profiles, and the Pfam • domain collectionVisiting the three most popular sites for finding domains in your protein • Predicting simple physical properties of your sequences • Predicting protease digestion patternsPredicting coiled-coil domains • Predicting post-translational modifications

  3. Predicting the main physico-chemicalproperties of a protein • ProtParam:Physico-chemical parameters of a protein sequence (amino-acid and atomic compositions, isoelectric point, extinction coefficient, etc.)

  4. Digesting a protein in a computer • eparate the domains in your protein • Identify potential post-translational modification by mass spectrometry • Remove a tag protein when you express a fusion protein • Make sure that the protein you’re cloning isn’t sensitive to some endoge-nous proteases • Available from the ExPASy Web site at PeptideCutter www.expasy.org/tools/#proteome

  5. Doing Primary Structure Analysis • Hydrophobic regions that could be membrane-spanning segments in pro-teins that anchor themselves into a membrane • Coiled-coil regions that indicate potential protein-protein interaction • Hydrophilic stretches that could be looping out at the surface of the protein

  6. Sliding windows • The “sliding windows” technique is the most ancient way of looking at sequences. The principle is very simple. What you need is a chemical property and a list of values associated with each of the 20 amino acids. This property can be any measurable physico-chemical parameter, such as size, polarity, hydrophobicity, or even the propensity of amino acids to be in a specific structural state. The values in this table are the amino acids’ scale values. Many such tables exist that have been determined experimentally for almost any characteristic you can think of.

  7. Looking for transmembrane segments • ProtScale uses a sliding-window technique and one of many amino-acid scale values. In this example, we use the hydrophobicity to identify the groups of hydrophobic segments that characterize transmembrane proteins. ProtScale doesn’t predict anything for you; it returns a hydrophobicity profile and lets you do the interpretation. • TMHMM is a state-of-the-art program that predicts transmembrane segments in your protein. TMHMM also tells you about the portions of your protein that are probably inside the cell and those that are probably outside.

  8. Looking for coiled-coil regions • Coiled-coil regions are portions of a protein formed by the intertwining of two or three alpha-helices. One reason it’s considered interesting to find coiled-coil regions is that they’re often involved in protein-protein interactions. • Another (less glorious) reason is that these coiled-coil regions can give false matches when you do a database search— and it can be a good thing to filter them out. If you want to predict these regions in your protein of interest, you can use the conveniently named COILS server at EMBnet www.ch.embnet.org/software/COILS_form.html

  9. Predicting Post-TranslationalModifications in Your Protein • These modifications may involve adding sugars, modifying amino acids, or removing pieces of the newly synthesized protein. • This may be very important if you want to clone and express a human protein in bacteria — because, in order to be active, your protein may require some post-translational modifications that the bacterium itself cannot make. • http://www.expasy.org/tools/#ptm

  10. Looking for PROSITE patterns • ScanProsite:www.expasy.org/tools/scanprosite/

  11. Finding Known Domains in Your Protein • a domain is a portion of protein that can keep its shape • InterProScan • CD-Search • Motif-Scan

  12. Finding domains with InterProScan • www.ebi.ac.uk/InterProScan/.

  13. Finding domains with the CD server • The main advantage of the CD server is that reported hits come with a score that helps you discriminate the good from the spurious matches. • www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

  14. Finding domains with Motif Scan • Motif Scan includes some domains that have not yet been released officially via InterPro. • myhits.isb-sib.ch/cgi-bin/motif_scan

  15. Epitope prediction • Antibodies are produced by B lymphocytes (B cells) • Antibodies circulate in the blood • They are referred to as “the first line of defense” against infection • Antibodies play a central role in immunity by attaching to pathogens and recruiting effector systems that kill the invader

  16. What is a B cell epitope? • Antibodies are developed to bind the epitope with high affinity by using the complementarity determining regions (CDRs)

  17. Motivations for prediction of B cell epitopes • Prediction of B cell epitopes can potentially guide experimental epitope mapping • Predictions of antigenicity in proteins can be used for selecting subunits in rational vaccine design • Predictions of B cell epitopes may also be valuable for interpretation of results from experiments based on antibody affinity binding such as ELISA, RIA

  18. Computational Rational Vaccine Design

  19. B cell epitopes, linear or discontinuous? • Classified into linear (~10%) and discontinuous epitopes (~90%) • Databases: AntiJen, IEDB, BciPep, Los Alamos HIV database, Protein Data Bank • Large amount of data available for linear epitopes • Few data available for discontinuous epitopes • In general, B cell epitope prediction methods have relatively low performances

  20. Discontinuous B cell epitopes

  21. The binding interactions

  22. B-cell epitope data bases • Databases: AntiJen, IEDB, BciPep, Los Alamos HIV database, Protein Data Bank • Large amount of data available for linear epitopes • Few data available for discontinuous

  23. Sequence-based methods for prediction of linear epitopes • Protein hydrophobicity – hydrophilicity algorithms Parker, Fauchere, Janin, Kyte and Doolittle, Manavalan Sweet and Eisenberg, Goldman, Engelman and Steitz (GES), von Heijne • Protein flexibility prediction algorithm Karplus and Schulz • Protein secondary structure prediction algorithms GOR II method (Garnier and Robson), Chou and Fasman, Pellequer • Protein “antigenicity” prediction : Hopp and Woods, Welling

  24. Propensity scales: The principle

  25. Propensity scales: The principle

  26. Evaluation of performance • A Receiver Operator Curve (ROC) is useful for finding a good threshold and rank methods

  27. Turn prediction and B-cell epitopes • Pellequer found that 50% of the epitopes in a data set of 11 proteins were located in turns • Turn propensity scales for each position in the turn were used for epitope prediction.

  28. BepiPred: CBS in-house tool • Parker hydrophilicity scale • Hidden Markov model • Markov model based on linear epitopes extracted from the AntiJen database • Combination of the Parker prediction scores and Markov model leads to prediction score • Tested on the Pellequer dataset and epitopes in the HIV Los Alamos database • www.cbs.dtu.dk/services/BepiPred

  29. Protean • Several tools integrated • Easy to handle

More Related