1 / 22

Protein Sequencing and Identification

Protein Sequencing and Identification. Motivation. Want to know which proteins are present in the cell Protein identification: Given a protein sample, does it match some protein in a database ? Protein sequencing: No database. Directly find the sequence of the protein sample. History.

melva
Download Presentation

Protein Sequencing and Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Sequencing and Identification

  2. Motivation • Want to know which proteins are present in the cell • Protein identification: Given a protein sample, does it match some protein in a database ? • Protein sequencing: No database. Directly find the sequence of the protein sample.

  3. History • First protein sequencing done by Nobel laureate Fred Sanger • broke the insulin protein into pieces (“peptides”) • sequenced each resulting fragment separately • reconstructed entire insulin sequence by fragment assembly

  4. Peptide Fragmentation Collision Induced Dissociation H+ H...-HN-CH-CO . . .NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 Prefix Fragment Suffix Fragment • Peptides tend to fragment along the backbone. • Fragments can also loose neutral chemical groups like NH3 and H2O.

  5. Breaking Protein into Peptides and Peptides into Fragment Ions • Proteases, e.g. trypsin, break protein into peptides. • A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. • Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. • Mass Spectrometer measure mass/chargeratio of an ion.

  6. N- and C-terminal Peptides P A G N F A P G N F A N P G F C-terminal peptides N-terminal peptides A N F P G P A N F G

  7. Terminal peptides and ion types P G N F Peptide H2O Mass (D) 57 + 97 + 147 + 114 = 415 P G N F Peptide without H2O Mass (D) 57 + 97 + 147 + 114 – 18 = 397

  8. N- and C-terminal Peptides 486 P A G N F A 71 P G N F 415 301 A N P G F 185 C-terminal peptides N-terminal peptides A N F P G 332 154 P A N F G 429 57

  9. N- and C-terminal Peptides 486 71 415 301 185 C-terminal peptides N-terminal peptides 332 154 429 57

  10. N- and C-terminal Peptides 486 71 415 301 185 332 154 429 57

  11. N- and C-terminal Peptides 486 71 415 Reconstruct peptide from the set of masses of fragment ions (mass-spectrum) 301 185 332 154 429 57

  12. Peptide sequencing problem • A = {a1, a2, … a20} : set of amino acids, each with mass m(ai) • Peptide P = p1…pn is a sequence of amino acids, with parental mass m(P) = ∑im(pi) • Partial N-terminal peptide Pi = p1…pi with mass mi • Mass spectrum has the masses of all partial N-terminal peptides, determined experimentally • Ignoring C-terminal peptides for simplicity

  13. Peptide sequencing problem • A peptide may lose one or more smaller parts of itself (such as a water or an ammonia) • The Mass spectrometer measures mass of fragments that may not be the entire fragment Pi. • Assume k different ion losses possible. • Possible losses of mass: ∆ = {∂1, … ∂k}

  14. Theoretical spectrum • The theoretical spectum T(P) of a peptide P can be calculated by subtracting all possible mass losses ∂1…∂k from masses of all partial peptides of P • Each partial peptide generates k masses in the theoretical spectrum

  15. Match between Spectra and the Shared Peak Count • The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) • In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks • Match between experimental and theoretical spectra is defined similarly

  16. Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: • S: experimental spectrum • Δ: set of possible ion types • m: parent mass Output: • P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the best

  17. Vertices of Spectrum Graph • Masses of potential N-terminal peptides • Vertices are generated by reverse shifts corresponding to ion types Δ={δ1, δ2,…, δk} • Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ1, s+δ2, …, s+δk} corresponding to potential N-terminal peptides • Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}

  18. Edges of Spectrum Graph • Two vertices with mass difference corresponding to an amino acid A: • Connect with an edge labeled by A • Gap edges for di- and tri-peptides

  19. Paths • Path in the labeled graph spell out amino acid sequences • There are many paths, how to find the correct one? • We need scoring to evaluate paths

  20. Path Score • p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq} • p(P, s) = the probability that peptide P generates a peak s • Scoring = computing probabilities • p(P,S) = πsєSp(P, s)

  21. p(P, s) • What is the probability that peptide P will produce a fragment mass s ? • Each ion type ∂i has some probability of occurring, written as qi • A peptide has all k peaks with probability • and no peaks with probability • Suppose that a partial peptide Pi produces ions ∂1…∂l and does not produce ions ∂l+1…∂k

  22. p(P, s) • Then p(P,s) = • A peptide also produces a ``random noise'' with uniform probability qR in any position.

More Related