310 likes | 549 Views
Protein sequence analysis. Xu Cheng. Knowing what you must about domains, HMMs, profiles, and the Pfam domain collectionVisiting the three most popular sites for finding domains in your protein Predicting simple physical properties of your sequences
E N D
Protein sequence analysis Xu Cheng
Knowing what you must about domains, HMMs, profiles, and the Pfam • domain collectionVisiting the three most popular sites for finding domains in your protein • Predicting simple physical properties of your sequences • Predicting protease digestion patternsPredicting coiled-coil domains • Predicting post-translational modifications
Predicting the main physico-chemicalproperties of a protein • ProtParam:Physico-chemical parameters of a protein sequence (amino-acid and atomic compositions, isoelectric point, extinction coefficient, etc.)
Digesting a protein in a computer • eparate the domains in your protein • Identify potential post-translational modification by mass spectrometry • Remove a tag protein when you express a fusion protein • Make sure that the protein you’re cloning isn’t sensitive to some endoge-nous proteases • Available from the ExPASy Web site at PeptideCutter www.expasy.org/tools/#proteome
Doing Primary Structure Analysis • Hydrophobic regions that could be membrane-spanning segments in pro-teins that anchor themselves into a membrane • Coiled-coil regions that indicate potential protein-protein interaction • Hydrophilic stretches that could be looping out at the surface of the protein
Sliding windows • The “sliding windows” technique is the most ancient way of looking at sequences. The principle is very simple. What you need is a chemical property and a list of values associated with each of the 20 amino acids. This property can be any measurable physico-chemical parameter, such as size, polarity, hydrophobicity, or even the propensity of amino acids to be in a specific structural state. The values in this table are the amino acids’ scale values. Many such tables exist that have been determined experimentally for almost any characteristic you can think of.
Looking for transmembrane segments • ProtScale uses a sliding-window technique and one of many amino-acid scale values. In this example, we use the hydrophobicity to identify the groups of hydrophobic segments that characterize transmembrane proteins. ProtScale doesn’t predict anything for you; it returns a hydrophobicity profile and lets you do the interpretation. • TMHMM is a state-of-the-art program that predicts transmembrane segments in your protein. TMHMM also tells you about the portions of your protein that are probably inside the cell and those that are probably outside.
Looking for coiled-coil regions • Coiled-coil regions are portions of a protein formed by the intertwining of two or three alpha-helices. One reason it’s considered interesting to find coiled-coil regions is that they’re often involved in protein-protein interactions. • Another (less glorious) reason is that these coiled-coil regions can give false matches when you do a database search— and it can be a good thing to filter them out. If you want to predict these regions in your protein of interest, you can use the conveniently named COILS server at EMBnet www.ch.embnet.org/software/COILS_form.html
Predicting Post-TranslationalModifications in Your Protein • These modifications may involve adding sugars, modifying amino acids, or removing pieces of the newly synthesized protein. • This may be very important if you want to clone and express a human protein in bacteria — because, in order to be active, your protein may require some post-translational modifications that the bacterium itself cannot make. • http://www.expasy.org/tools/#ptm
Looking for PROSITE patterns • ScanProsite:www.expasy.org/tools/scanprosite/
Finding Known Domains in Your Protein • a domain is a portion of protein that can keep its shape • InterProScan • CD-Search • Motif-Scan
Finding domains with InterProScan • www.ebi.ac.uk/InterProScan/.
Finding domains with the CD server • The main advantage of the CD server is that reported hits come with a score that helps you discriminate the good from the spurious matches. • www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
Finding domains with Motif Scan • Motif Scan includes some domains that have not yet been released officially via InterPro. • myhits.isb-sib.ch/cgi-bin/motif_scan
Epitope prediction • Antibodies are produced by B lymphocytes (B cells) • Antibodies circulate in the blood • They are referred to as “the first line of defense” against infection • Antibodies play a central role in immunity by attaching to pathogens and recruiting effector systems that kill the invader
What is a B cell epitope? • Antibodies are developed to bind the epitope with high affinity by using the complementarity determining regions (CDRs)
Motivations for prediction of B cell epitopes • Prediction of B cell epitopes can potentially guide experimental epitope mapping • Predictions of antigenicity in proteins can be used for selecting subunits in rational vaccine design • Predictions of B cell epitopes may also be valuable for interpretation of results from experiments based on antibody affinity binding such as ELISA, RIA
B cell epitopes, linear or discontinuous? • Classified into linear (~10%) and discontinuous epitopes (~90%) • Databases: AntiJen, IEDB, BciPep, Los Alamos HIV database, Protein Data Bank • Large amount of data available for linear epitopes • Few data available for discontinuous epitopes • In general, B cell epitope prediction methods have relatively low performances
B-cell epitope data bases • Databases: AntiJen, IEDB, BciPep, Los Alamos HIV database, Protein Data Bank • Large amount of data available for linear epitopes • Few data available for discontinuous
Sequence-based methods for prediction of linear epitopes • Protein hydrophobicity – hydrophilicity algorithms Parker, Fauchere, Janin, Kyte and Doolittle, Manavalan Sweet and Eisenberg, Goldman, Engelman and Steitz (GES), von Heijne • Protein flexibility prediction algorithm Karplus and Schulz • Protein secondary structure prediction algorithms GOR II method (Garnier and Robson), Chou and Fasman, Pellequer • Protein “antigenicity” prediction : Hopp and Woods, Welling
Evaluation of performance • A Receiver Operator Curve (ROC) is useful for finding a good threshold and rank methods
Turn prediction and B-cell epitopes • Pellequer found that 50% of the epitopes in a data set of 11 proteins were located in turns • Turn propensity scales for each position in the turn were used for epitope prediction.
BepiPred: CBS in-house tool • Parker hydrophilicity scale • Hidden Markov model • Markov model based on linear epitopes extracted from the AntiJen database • Combination of the Parker prediction scores and Markov model leads to prediction score • Tested on the Pellequer dataset and epitopes in the HIV Los Alamos database • www.cbs.dtu.dk/services/BepiPred
Protean • Several tools integrated • Easy to handle