1 / 68

Evolution and Scoring Rules

Evolution and Scoring Rules. Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps). Scoring Matrices.

chinue
Download Presentation

Evolution and Scoring Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution and Scoring Rules • Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) • Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps)

  2. Scoring Matrices

  3. Scoring Rules vs. Scoring Matrices • Nucleotide vs. Amino Acid Sequence • The choice of a scoring rule can strongly influence the outcome of sequence analysis • Scoring matrices implicitly represent a particular theory of evolution • Elements of the matrices specify the similarity of one residue to another

  4. Translation - Protein Synthesis:Every 3 nucleotides (codon) are translated into one amino acid DNA: A T G C 1:1 RNA: A U G C 3:1 Protein: 20 amino acids Replication Transcription Translation

  5. Nucleotide sequence determines the amino acid sequence

  6. Translation - Protein Synthesis RNA Protein 5’ -> 3’ : N-term -> C-term

  7. Log Likelihoods used as Scoring Matrices:PAM - % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM – Blocks Substitution Matrix:2000 “blocks” from 500 families

  8. Log Likelihoods used as Scoring Matrices:BLOSUM

  9. Likelihood Ratio for Aligning a Single Pair of Residues • Above: the probability that two residues are aligned by evolutionary descent • Below: the probability that they are aligned by chance • Pi, Pj are frequencies of residue i and j in all protein sequences (abundance)

  10. Likelihood Ratio of Aligning Two Sequences

  11. The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models • Common ancestry • By chance

  12. PAM and BLOSUM matrices are all log likelihood matrices • More specificly: • An alignment that scores 6 means that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

  13. BLOSUM matrices for Protein • S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919 • Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

  14. Constructing BLOSUM Matrices of Specific Similarities • Sets of sequences have widely varying similarity. Sequences with above a threshold similarity are clustered. • If clustering threshold is 62%, final matrix is BLOSUM62

  15. A toy example of constructing a BLOSUM matrix from 4 training sequences

  16. Constructing a BLOSUM matr.1. Counting mutations

  17. Constructing a BLOSUM matr.2. Tallying mutation frequencies

  18. Constructing a BLOSUM matr.3. Matrix of mutation probs.

  19. 4. Calculate abundance of each residue (Marginal prob)

  20. 5. Obtaining a BLOSUM matrix

  21. Constructing the real BLOSUM62 Matrix

  22. 1.2.3.Mutation Frequency Table

  23. 4. Calculate Amino Acid Abundance

  24. 5. Obtaining BLOSUM62 Matrix

  25. PAM Matrices (Point Accepted Mutations) Mutations accepted by natural selection

  26. PAM Matrices • Accepted Point Mutation • Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff. ed. National Biomedical Research Foundation, 1 • Based on evolutionary principles

  27. Constructing PAM Matrix: Training Data

  28. PAM: Phylogenetic Tree

  29. PAM: Accepted Point Mutation

  30. Mutability

  31. Total Mutation Rate is the total mutation rate of all amino acids

  32. Normalize Total Mutation Rate

  33. Mutation Probability Matrix Normalized Such that the Total Mutation Rate is 1%

  34. Mutation Probability Matrix (transposed) M*10000

  35. -- PAM1 mutation prob. matr. --PAM2 Mutation Probability Matrix? -- Mutations that happen in twice the evolution period of that for a PAM1

  36. PAM Matrix: Assumptions

  37. In two PAM1 periods: • {AR} = {AA and AR} or {AN and NR} or {AD and DR} or … or {AV and VR}

  38. Entries in a PAM-2 Mut. Prob. Matr.

  39. PAM-k Mutation Prob. Matrix

  40. PAM-1 log likelihood matrix

  41. PAM-k log likelihood matrix

  42. PAM-250

  43. PAM60—60%, PAM80—50%, • PAM120—40% • PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

  44. Sources of Error in PAM

  45. PAM Based on extrapolation of a small evol. Period Track evolutionary origins Homologous seq.s during evolution BLOSUM Based on a range of evol. Periods Conserved blocks Find conserved domains Comparing Scoring Matrix

More Related