1 / 142

Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry. Outline. Introduction to Protein Structure Peptide Mass Spectra De Novo Peptide Sequencing Spectrum Graphs Protein Identification via Database Search Spectral Convolution Spectral Alignment Problem.

ajaxe
Download Presentation

Protein Sequencing and Identification by Mass Spectrometry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Sequencing and Identification by Mass Spectrometry

  2. Outline • Introduction to Protein Structure • Peptide Mass Spectra • De Novo Peptide Sequencing • Spectrum Graphs • Protein Identification via Database Search • Spectral Convolution • Spectral Alignment Problem TK modified for WMU CS 6030

  3. Why are proteins and their sequences important? • Necessary in order to understand how cells and their biochemical pathways function. • A unique set of proteins is involved in each function • Each organism has a unique set of expressed proteins • Each type of cell / tissue has a unique set of expressed proteins • A key to understanding how the brain or liver works is to determine what proteins they produce TK Added for WMU CS 6030

  4. Basic Summary of Protein Structure • A polypeptide is a polymer of amino acid residues, a linear chain based on amino acids • A polypeptide can be completely specified by indicating the sequence of amino acids • A protein is a collection of one more peptides bonded together

  5. The Amino Acids

  6. Amino Acids – the basic building block

  7. Formation of polypeptides • Two amino acids can combine to form a dipeptide • Structure in blue is a peptide link • If many amino acids are joined, we call it a polypeptide

  8. Ends of the polypeptide chain • Note that a polypeptide is a chain of amino acid residues (water molecule is lost). • End of chain with –NH2 is the N-terminal • End of chain with –COOHis the C-terminal

  9. Polypeptides -> Proteins

  10. Proteins can be complex structures

  11. De Novo Sequencing Strategy • Break proteins into peptide chain fragments • Use enzymes to cut into pieces • Further break apart peptide chains using mass spectrometry • Peptide chains will break up in a manner predictable by molecular physics • The masses of the molecular fragments will collectively deliver a “mass spectrum” from which the sequence can be derived • First, we’ll explore the ways that mass spectrometry can break peptides apart TK added for WMU CS 6030

  12. Masses of Amino Acid Residues

  13. The Periodic Table (part of it!)

  14. Protein Backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei

  15. Peptide Fragmentation Collision Induced Dissociation H+ H...-HN-CH-CO . . .NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 Prefix Fragment Suffix Fragment • Peptides tend to fragment along the backbone. • Fragments can also loose neutral chemical groups like NH3 and H2O.

  16. Breaking Protein into Peptides and Peptides into Fragment Ions • Proteases, e.g. trypsin, break protein into peptides. • A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. • Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. • Mass Spectrometer measure mass/chargeratio of an ion.

  17. N- and C-terminal Peptides P A G N F A P G N F A N P G F C-terminal peptides N-terminal peptides A N F P G P A N F G

  18. Terminal peptides and ion types P G N F Peptide H2O Mass (D) 57 + 97 + 147 + 114 = 415 P G N F Peptide without H2O Mass (D) 57 + 97 + 147 + 114 – 18 = 397

  19. N- and C-terminal Peptides 486 P A G N F A 71 P G N F 415 301 A N P G F 185 C-terminal peptides N-terminal peptides A N F P G 332 154 P A N F G 429 57

  20. N- and C-terminal Peptides 486 71 415 301 185 C-terminal peptides N-terminal peptides 332 154 429 57

  21. Peptide Fragmentation b2-H2O b3- NH3 a2 b2 a3 b3 HO NH3+ | | R1 O R2 O R3 O R4 | || | || | || | H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H y3 y2 y1 y2 - NH3 y3 -H2O

  22. G V D L K L 57 Da = ‘G’ K D V G 99 Da = ‘V’ H2O D Mass Spectra • The peaks in the mass spectrum: • Prefix • Fragments with neutral losses (-H2O, -NH3) • Noise and missing peaks. mass 0 and Suffix Fragments.

  23. G V D L K • Peptide Identification: Intensity MS/MS mass 0 mass 0 Protein Identification with MS/MS

  24. Tandem Mass-Spectrometry

  25. Breaking Proteins into Peptides HPLC GTDIMR To MS/MS PAKID MPSERGTDIMRPAKID...... MPSER …… …… protein peptides

  26. collision cell MS-2 MS-1 Ion Source Tandem Mass Spectrometry MS LC Scan 1707 MS/MS Scan 1708

  27. Tandem Mass Spectrum • Tandem Mass Spectrometry (MS/MS): mainly generates partial N- and C-terminal peptides • Spectrum consists of different ion types because peptides can be broken in several places. • Chemical noise often complicates the spectrum. • Represented in 2-D: mass/charge axis vs. intensity axis

  28. W R V A L Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. T G E P L K C W D T Database of all peptides = 20nAAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI, AVGELTI, AVGELTK , AVGELTL, AVGELTM, YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY W R V A L T G E P L K C W D T De Novo vs. Database Search Database Search De Novo Mass, Score AVGELTK

  29. De Novo vs. Database Search: A Paradox • The database of all peptides is huge ≈ O(20n) . • The database of all known peptides is much smaller ≈ O(108). • However, de novo algorithms can be much faster, even though their search space is much larger! • A database search scans all peptides in the database of all known peptides search space to find best one. • De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.

  30. De novo Peptide Sequencing Sequence

  31. Theoretical Spectrum

  32. Theoretical Spectrum (cont’d)

  33. Theoretical Spectrum (cont’d)

  34. Two approaches to determining sequence from the mass spectra • Exhaustive Search • Involves analysis of 20L sequences of length L • Branch and bound advisable, but success has been limited • Analysis of Spectrum Graph • Based on experimental spectrum rather than all possible matches to possible spectra TK added slide for WMU CS 6030

  35. Building Spectrum Graph • How to create vertices (from masses) • How to create edges (from mass differences) • How to score paths • How to find best path

  36. Ion Types • Some masses correspond to fragment ions, others are just random noise • Knowing ion typesΔ={δ1, δ2,…, δk} lets us distinguish fragment ions from noise • We can learn ion types δi and their probabilities qi by analyzing a large test sample of annotated spectra.

  37. Example of Ion Type • Δ={δ1, δ2,…, δk} • Ion types {b, b-NH3, b-H2O} correspond to Δ={0, 17, 18} • This is a set of masses of chemical groups frequently cleaved from peptide fragments during bombardment of molecules with electrons in the mass spectrometer collision cell *Note: In reality the δ value of ion type b is -1 but we will “hide” it for the sake of simplicity TK modified slide for WMU CS 6030

  38. Theoretical Spectra • The theoretical spectra of peptide P is designated as T(P) • T(P) can be calculated by subtracting all possible ion types Δ={δ1, δ2,…, δk} from all possible partial peptides of P • Every partial peptide will have k masses in T(P) TK added slide for WMU CS 6030

  39. Match between Spectra and the Shared Peak Count • The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) • In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks • Match between experimental (S) and theoretical spectra (T) is defined similarly

  40. Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: • S: experimental spectrum • Δ: set of possible ion types • m: parent mass Output: • P: peptide with mass m, whose theoretical spectrum T matches the experimental S spectrum the best

  41. Vertices of Spectrum Graph • Masses of potential N-terminal peptides • Vertices are generated by reverse shifts corresponding to ion types Δ={δ1, δ2,…, δk} • Every N-terminal peptide can generate up to k ions m-δ1, m-δ2, …, m-δk • Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ1, s+δ2, …, s+δk} corresponding to potential N-terminal peptides • Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}

  42. Edges of Spectrum Graph • Two vertices with mass difference corresponding to an amino acid A: • Connect with an edge labeled by A

  43. Peptide Sequences • A possible peptide sequence is identified by finding a path from mass 0 to parent mass m in the resulting DAG • Many possible sequences (paths) will typically be identified in the DAG • The “best path” is identified by scoring using probabilities that are based on experimental data TK added slide for WMU CS 6030

  44. Path Score • p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq} • p(P, s) = the probability that peptide P generates a peak (mass) s • Scoring = computing probabilities • p(P,S) = πsєSp(P, s)

  45. Peak Score • For a position t that represents ion type dj : qj, if peak is generated at t p(P,st) = 1-qj , otherwise

  46. Peak Score (cont’d) • For a position t that is not associated with an ion type: qR , if peak is generated at t pR(P,st) = 1-qR , otherwise • qR = the probability of a noisy peak that does not correspond to any ion type

  47. Finding Optimal Paths in the Spectrum Graph • For a given MS/MS spectrum S, find a peptide P’ maximizing p(P,S) over all possible peptides P: • Peptides = paths in the spectrum graph • P’ = the optimal path in the spectrum graph

  48. Ions and Probabilities • Tandem mass spectrometry is characterized by a set of ion types {δ1,δ2,..,δk}and their probabilities {q1,...,qk} • δi-ions of a partial peptide are produced independentlywith probabilities qi

  49. Ions and Probabilities • A peptide has all k peaks with probability • and no peaks with probability • A peptide also produces a ``random noise'' with uniform probability qR in any position.

  50. Ratio Test Scoring for Partial Peptides • Incorporates premiums for observed ions and penalties for missing ions. • Example: for k=4, assume that for a partial peptide P’ we only see ions δ1,δ2,δ4. The score is calculated as:

More Related