250 likes | 479 Views
Protein Sequencing and Identification. Motivation. Want to know which proteins are present in the cell Protein identification: Given a protein sample, does it match some protein in a database ? Protein sequencing: No database. Directly find the sequence of the protein sample. History.
E N D
Motivation • Want to know which proteins are present in the cell • Protein identification: Given a protein sample, does it match some protein in a database ? • Protein sequencing: No database. Directly find the sequence of the protein sample.
History • First protein sequencing done by Nobel laureate Fred Sanger • broke the insulin protein into pieces (“peptides”) • sequenced each resulting fragment separately • reconstructed entire insulin sequence by fragment assembly
Peptide Fragmentation Collision Induced Dissociation H+ H...-HN-CH-CO . . .NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 Prefix Fragment Suffix Fragment • Peptides tend to fragment along the backbone. • Fragments can also loose neutral chemical groups like NH3 and H2O.
Breaking Protein into Peptides and Peptides into Fragment Ions • Proteases, e.g. trypsin, break protein into peptides. • A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. • Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. • Mass Spectrometer measure mass/chargeratio of an ion.
N- and C-terminal Peptides P A G N F A P G N F A N P G F C-terminal peptides N-terminal peptides A N F P G P A N F G
Terminal peptides and ion types P G N F Peptide H2O Mass (D) 57 + 97 + 147 + 114 = 415 P G N F Peptide without H2O Mass (D) 57 + 97 + 147 + 114 – 18 = 397
N- and C-terminal Peptides 486 P A G N F A 71 P G N F 415 301 A N P G F 185 C-terminal peptides N-terminal peptides A N F P G 332 154 P A N F G 429 57
N- and C-terminal Peptides 486 71 415 301 185 C-terminal peptides N-terminal peptides 332 154 429 57
N- and C-terminal Peptides 486 71 415 301 185 332 154 429 57
N- and C-terminal Peptides 486 71 415 Reconstruct peptide from the set of masses of fragment ions (mass-spectrum) 301 185 332 154 429 57
Peptide sequencing problem • A = {a1, a2, … a20} : set of amino acids, each with mass m(ai) • Peptide P = p1…pn is a sequence of amino acids, with parental mass m(P) = ∑im(pi) • Partial N-terminal peptide Pi = p1…pi with mass mi • Mass spectrum has the masses of all partial N-terminal peptides, determined experimentally • Ignoring C-terminal peptides for simplicity
Peptide sequencing problem • A peptide may lose one or more smaller parts of itself (such as a water or an ammonia) • The Mass spectrometer measures mass of fragments that may not be the entire fragment Pi. • Assume k different ion losses possible. • Possible losses of mass: ∆ = {∂1, … ∂k}
Theoretical spectrum • The theoretical spectum T(P) of a peptide P can be calculated by subtracting all possible mass losses ∂1…∂k from masses of all partial peptides of P • Each partial peptide generates k masses in the theoretical spectrum
Match between Spectra and the Shared Peak Count • The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) • In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks • Match between experimental and theoretical spectra is defined similarly
Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: • S: experimental spectrum • Δ: set of possible ion types • m: parent mass Output: • P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the best
Vertices of Spectrum Graph • Masses of potential N-terminal peptides • Vertices are generated by reverse shifts corresponding to ion types Δ={δ1, δ2,…, δk} • Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ1, s+δ2, …, s+δk} corresponding to potential N-terminal peptides • Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}
Edges of Spectrum Graph • Two vertices with mass difference corresponding to an amino acid A: • Connect with an edge labeled by A • Gap edges for di- and tri-peptides
Paths • Path in the labeled graph spell out amino acid sequences • There are many paths, how to find the correct one? • We need scoring to evaluate paths
Path Score • p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq} • p(P, s) = the probability that peptide P generates a peak s • Scoring = computing probabilities • p(P,S) = πsєSp(P, s)
p(P, s) • What is the probability that peptide P will produce a fragment mass s ? • Each ion type ∂i has some probability of occurring, written as qi • A peptide has all k peaks with probability • and no peaks with probability • Suppose that a partial peptide Pi produces ions ∂1…∂l and does not produce ions ∂l+1…∂k
p(P, s) • Then p(P,s) = • A peptide also produces a ``random noise'' with uniform probability qR in any position.