180 likes | 330 Views
Introduction to Bioinformatics - Tutorial no. 8. Predicting protein structure PSI-BLAST. PHDsec and PSIpred. PHDsec Rost & Sander, 1993 Based on sequence family alignments PSIpred Jones, 1999 Based on PSI-BLAST profiles Both consider long-range interactions. PSIpred Input.
E N D
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST
PHDsec and PSIpred • PHDsec • Rost & Sander, 1993 • Based on sequence family alignments • PSIpred • Jones, 1999 • Based on PSI-BLAST profiles • Both consider long-range interactions
PSIpred Input Input sequence Type of Analysis
PSIpred Input (2) Filtering Options Email address GO!
PSIpred Output Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Conf: 988766667637889999877999871289878877049963202468899999997887 Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE 10 20 30 40 50 60 Conf: 742888731467888768899999999999999987557888998875227887303678 Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA 70 80 90 100 110 120 Confidence level Predicted structure
Additional output Output format Reduce processing PHDsec Input (1) Email address Type of prediction
PHDsec Input (2) Type (number) of input sequences Upload file Enter sequence Wait for results?
PHDsec Output (1) Protein classification Structure proportions Amino acid proportions
PHDsec Output (2) Estimated structure Confidence level Structure with high confidence
PSI-BLAST • Position-Specific Iterative BLAST • Extension to BLASTP • Finds more distantly related sequences • Distant sequences with insignificant E values • Even in distantly related sequences, important domains can be highly conserved • PSI-BLAST gives more weight to those
PSI-BLAST Profile • 123456 • AMTYQR • CTTYQS • SMTYQA • When close sequences are aligned – areas of conservation. • Scoring matrix becomes position specific • Each column has a unique set of a.a. frequencies. • Score is column specific, based on a.a. frequency. • More frequent a.a. -> higher score. • A new sequence is scored based on the new scoring matrix.
A PSI-BLAST Iteration • Collect all database sequence segments that have been aligned with query sequence with E-value below set threshold (default 0.01) • Construct position specific scoring matrix for collected sequences. Rough idea: • Align all sequences to the query sequence as the template. • Assign weights to the sequences • Construct position specific scoring matrix • Find sequences that mach the profile
Available from main BLAST page Or switch on in BLASTP Using PSI-BLAST (1) E value threshold for initial inclusion in multiple alignment for profile
Using PSI-BLAST (2) Align selected sequences, generate profile, search again Number of results to show next iteration Select whether to include in next iteration New result
Exercise 1 • There is a protein with an unknown structure: • >some protein MEAFLGTWKMEKSEGFDKIMERLGVDFVTRKMGNLVKPNLIVTDLGGGKYKMRSESTFKTTECSFKLGEKFKEVTRFTRGHFFMITVENGVMKHEQDDKTKVTYIERVVEGNELKATVKVDEVVCVRTYSKVA • Can BLAST help us to predict its SS? • Use any secondary structure prediction method to predict the secondary structure of 1O8V and compare it to the solved structure. • NOTICE! The secondary structure definition in PDB is given in a 7 letter code instead of 3 letter code (H, E, C). For comparison purposes consider: G H and I as H; E as E ; all the rest including spaces as C. • 3. What can you conclude about the secondary structure prediction in this case? • 4. Are the results consistent with the confidence value of the prediction? • 5. Can you explain the prediction results based on the real structure?
Exercise 2 • Prion is the protein which responsible to the Mad Cow Disease. In the normal situation the amino acids in a specific region are arranged in α-helix (H1). In the abnormal situations this region undergoes a change into a β-strand conformation. • This conformational change is thought to be the origin of the disease, which brings to a rapid degeneration of the nerve system, and usually causes death. • It is assumed that the prion molecules, which changed conformations, accelerate the conformational change of additional molecules. • Check what conformation is predicted for this protein. • The PDB code of the prion protein is 1ag2. The helix is located at positions 21-30 on the sequence in this file. Does the predicted SS correlates with the real one in the region of interest?