1 / 78

Protein Identification by Sequence Database Search

Protein Identification by Sequence Database Search. Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University Medical Center. Outline. Proteomics Mass Spectrometry Protein Identification Peptide Mass Fingerprint Tandem Mass Spectrometry. Proteomics.

allayna
Download Presentation

Protein Identification by Sequence Database Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Identification by Sequence Database Search Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University Medical Center

  2. Outline • Proteomics • Mass Spectrometry • Protein Identification • Peptide Mass Fingerprint • Tandem Mass Spectrometry

  3. Proteomics • Proteins are the machines that drive much of biology • Genes are merely the recipe • The direct characterization of a sample’s proteins en masse. • What proteins are present? • How much of each protein is present?

  4. Protein separation Molecular weight (MW) Isoelectric point (pI) Staining Birds-eye view of protein abundance 2D Gel-Electrophoresis

  5. 2D Gel-Electrophoresis Bécamel et al., Biol. Proced. Online 2002;4:94-104.

  6. Paradigm Shift • Traditional protein chemistry assay methods struggle to establish identity. • Identity requires: • Specificity of measurement (Precision) • Mass spectrometry • A reference for comparison (Measurement → Identity) • Protein sequence databases

  7. Sample + _ Detector Ionizer Mass Analyzer Mass Spectrometer • ElectronMultiplier(EM) • Time-Of-Flight (TOF) • Quadrapole • Ion-Trap • MALDI • Electro-SprayIonization (ESI)

  8. Mass Spectrometer (MALDI-TOF) UV (337 nm) Microchannel plate detector Field-free drift zone Source Pulse voltage Analyte/matrix Ed = 0 Length = D Length = s Backing plate (grounded) Extraction grid (source voltage -Vs) Detector grid -Vs

  9. Mass Spectrum

  10. Mass is fundamental

  11. Peptide Mass Fingerprint Cut out 2D-GelSpot

  12. Peptide Mass Fingerprint Trypsin Digest

  13. Peptide Mass Fingerprint MS

  14. Peptide Mass Fingerprint

  15. Peptide Mass Fingerprint • Trypsin: digestion enzyme • Highly specific • Cuts after K & R except if followed by P • Protein sequence from sequence database • In silico digest • Mass computation • For each protein sequence in turn: • Compare computer generated masses with observed spectrum

  16. Protein Sequence • Myoglobin GLSDGEWQQV LNVWGKVEAD IAGHGQEVLI RLFTGHPETL EKFDKFKHLK TEAEMKASED LKKHGTVVLT ALGGILKKKG HHEAELKPLA QSHATKHKIP IKYLEFISDA IIHVLHSKHP GDFGADAQGA MTKALELFRN DIAAKYKELG FQG

  17. Protein Sequence • Myoglobin GLSDGEWQQV LNVWGKVEAD IAGHGQEVLI RLFTGHPETL EKFDKFKHLK TEAEMKASED LKKHGTVVLT ALGGILKKKG HHEAELKPLA QSHATKHKIP IKYLEFISDA IIHVLHSKHP GDFGADAQGA MTKALELFRN DIAAKYKELG FQG

  18. Amino-Acid Masses

  19. Peptide Mass & m/z • Peptide Molecular Weight: N-terminal-mass (0.00) + Sum (AA masses) + C-terminal-mass (18.010560) • Observed Peptide m/z: (Peptide Molecular Weight + z * Proton-mass (1.007825)) / z • Monoisotopic mass values!

  20. Peptide Masses 1811.90 GLSDGEWQQVLNVWGK 1606.85 VEADIAGHGQEVLIR 1271.66 LFTGHPETLEK 1378.83 HGTVVLTALGGILK 1982.05 KGHHEAELKPLAQSHATK 1853.95 GHHEAELKPLAQSHATK 1884.01 YLEFISDAIIHVLHSK 1502.66 HPGDFGADAQGAMTK 748.43 ALELFR

  21. Peptide Mass Fingerprint YLEFISDAIIHVLHSK GHHEAELKPLAQSHATK GLSDGEWQQVLNVWGK HPGDFGADAQGAMTK HGTVVLTALGGILK VEADIAGHGQEVLIR KGHHEAELKPLAQSHATK ALELFR LFTGHPETLEK

  22. Enzymatic Digest and Fractionation Sample Preparation for Tandem Mass Spectrometry

  23. Single Stage MS MS

  24. Tandem Mass Spectrometry(MS/MS) MS/MS

  25. Peptide Fragmentation Peptides consist of amino-acids arranged in a linear backbone. N-terminus H…-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus AA residuei-1 AA residuei AA residuei+1

  26. Peptide Fragmentation

  27. yn-i bi Peptide Fragmentation yn-i-1 -HN-CH-CO-NH-CH-CO-NH- Ri+1 Ri bi+1

  28. xn-i yn-i zn-i yn-i-1 -HN-CH-CO-NH-CH-CO-NH- CH-R’ Ri i+1 R” ai bi ci i+1 bi+1 Peptide Fragmentation

  29. Peptide Fragmentation Peptide: S-G-F-L-E-E-D-E-L-K

  30. 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity 0 m/z 250 500 750 1000 Peptide Fragmentation

  31. 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 y7 % Intensity y5 y2 y3 y8 y4 y9 0 m/z 250 500 750 1000 Peptide Fragmentation

  32. 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 y7 % Intensity y5 b3 b4 y2 y3 b5 y8 y4 b8 y9 b6 b7 b9 0 m/z 250 500 750 1000 Peptide Fragmentation

  33. Peptide Identification Given: • The mass of the precursor ion, and • The MS/MS spectrum Output: • The amino-acid sequence of the peptide

  34. Peptide Identification Two paradigms: • De novo interpretation • Sequence database search

  35. 100 % Intensity 0 m/z 250 500 750 1000 De Novo Interpretation

  36. 100 % Intensity E L 0 m/z 250 500 750 1000 De Novo Interpretation

  37. 100 % Intensity SGF G E E E D E KL E E D L L L F 0 m/z 250 500 750 1000 De Novo Interpretation

  38. De Novo Interpretation

  39. De Novo Interpretation …from Lu and Chen (2003), JCB 10:1

  40. De Novo Interpretation

  41. De Novo Interpretation …from Lu and Chen (2003), JCB 10:1

  42. De Novo Interpretation • Find good paths in spectrum graph • Can’t use same peak twice • Forbidden pairs: NP-hard • “Nested” forbidden pairs: Dynamic Prog. • Simple peptide fragmentation model • Usually many apparently good solutions • Needs better fragmentation model • Needs better path scoring

  43. De Novo Interpretation • Amino-acids have duplicate masses! • Incomplete ladders create ambiguity. • Noise peaks and unmodeled fragments create ambiguity • “Best” de novo interpretation may have no biological relevance • Current algorithms cannot model many aspects of peptide fragmentation • Identifies relatively few peptides in high-throughput workflows

  44. Sequence Database Search • Compares peptides from a protein sequence database with spectra • Filter peptide candidates by • Precursor mass • Digest motif • Score each peptide against spectrum • Generate all possible peptide fragments • Match putative fragments with peaks • Score and rank

  45. S G F L E E D E L K 100 % Intensity 0 m/z 250 500 750 1000 Sequence Database Search

  46. 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity 0 m/z 250 500 750 1000 Sequence Database Search

  47. 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 y7 % Intensity y5 b3 b4 y2 y3 b5 y8 y4 b8 y9 b6 b7 b9 0 m/z 250 500 750 1000 Sequence Database Search

  48. Sequence Database Search • No need for complete ladders • Possible to model all known peptide fragments • Sequence permutations eliminated • All candidates have some biological relevance • Practical for high-throughput peptide identification • Correct peptide might be missing from database!

  49. Peptide Candidate Filtering • Digestion Enzyme: Trypsin • Cuts just after K or R unless followed by a P. • Basic residues (K & R) at C-terminal attract ionizing charge, leading to strong y-ions • “Average” peptide length about 10-15 amino-acids • Must allow for “missed” cleavage sites

  50. Peptide Candidate Filtering >ALBU_HUMAN MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK… No missed cleavage sites MK WVTFISLLFLFSSAYSR GVFR R DAHK SEVAHR FK DLGEENFK ALVLIAFAQYLQQCPFEDHVK LVNEVTEFAK …

More Related