250 likes | 697 Views
Tutorial 4 Comparing Protein Sequences. Intro to Bioinformatics. Amino acids were not born equally. Comparing Protein Sequences. Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix Advance comparison tools Psi-BLAST Phi-BLAST. Substitution Matrix.
E N D
Tutorial 4Comparing Protein Sequences Intro to Bioinformatics
Comparing Protein Sequences • Substitution Matrices • PAM - Point Accepted Mutations • BLOSUM - Blocks Substitution Matrix • Advance comparison tools • Psi-BLAST • Phi-BLAST
Substitution Matrix • Scoring matrix S • 20x20 for protein alignment (Amino-acid) • Si,jrepresents the gain/penalty due to substituting AAj by AAi(i – line , j – colomn) • Based on likelihood this substitution is found in nature • Computed differently in PAM and BLOSUM
Computing probability of Mutation (Mi,j) • PAM - Point Accepted Mutations • Based on closely related proteins (X% divergence) • Matrices for comparison of divergent proteins computed • BLOSUM - Blocks Substitution Matrix • Based on conserved blocks bounded in similarity (at least X% identical) • Matrices for divergent proteins are derived using appropriate X%
PAM-1 • Captures mutation rates between close proteins • 1% divergence • Mi,j = AB / #A • Problematic when comparing far proteins • The 1% divergence does not capture more sporadic mutations • PAM250 is theoretical (extrapolation based)
BLOSUM62 • Captures mutation rates between divergent proteins • Why is BLOSUM62 called BLOSUM62? Basically, this is because all blocks whose members shared at least 62% identity with ANY other member of that block were averaged and represented as 1 sequence.
BLOSUM62 The idea of BLOSUM matrices is to get a better measure of differences between two proteins specifically for more distantly related proteins. • Similar AA have high score
Use Recommendations PAM100 ~ BLOSUM90 Closely Related PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Highly Divergent
Example • Query: >ADRM1_HUMAN (Proteasomal ubiquitin receptor) • Data Base: nr on Human genome. • Blast Program: BLASTP • Matrices: PAM30,BLOSUM45
What difference do we observe? • With BLOSUM45 we found related and divergent sequences. • With PAM30 we found only related sequences. BLOSUM45 PAM 30
With BLOSUM45 we can discover interesting relations between proteins PAM 30 Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens BLOSUM45 . . .
Using different scoring matrices can produce slightly Different alignments: With PAM 30 With BLOSUM45
A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):
PSI-BLAST Position Specific Iterative BLAST We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS
Threshold for initial BLAST Search (default:10) Threshold for inclusion in PSI-BLAST iterations (default:0.005)
The query itself Orthologous sequences in two other archaeal species Other homologous sequences
Is MJ0577 a filament protein? . . . Is MJ0577 a cationic amino transporter? . . . Is MJ0577 a universal stress protein? . . .