Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry

Outline • Introduction to Protein Structure • Peptide Mass Spectra • De Novo Peptide Sequencing • Spectrum Graphs • Protein Identification via Database Search • Spectral Convolution • Spectral Alignment Problem TK modified for WMU CS 6030

Why are proteins and their sequences important? • Necessary in order to understand how cells and their biochemical pathways function. • A unique set of proteins is involved in each function • Each organism has a unique set of expressed proteins • Each type of cell / tissue has a unique set of expressed proteins • A key to understanding how the brain or liver works is to determine what proteins they produce TK Added for WMU CS 6030

Basic Summary of Protein Structure • A polypeptide is a polymer of amino acid residues, a linear chain based on amino acids • A polypeptide can be completely specified by indicating the sequence of amino acids • A protein is a collection of one more peptides bonded together

The Amino Acids

Amino Acids – the basic building block

Formation of polypeptides • Two amino acids can combine to form a dipeptide • Structure in blue is a peptide link • If many amino acids are joined, we call it a polypeptide

Ends of the polypeptide chain • Note that a polypeptide is a chain of amino acid residues (water molecule is lost). • End of chain with –NH2 is the N-terminal • End of chain with –COOHis the C-terminal

Polypeptides -> Proteins

Proteins can be complex structures

De Novo Sequencing Strategy • Break proteins into peptide chain fragments • Use enzymes to cut into pieces • Further break apart peptide chains using mass spectrometry • Peptide chains will break up in a manner predictable by molecular physics • The masses of the molecular fragments will collectively deliver a “mass spectrum” from which the sequence can be derived • First, we’ll explore the ways that mass spectrometry can break peptides apart TK added for WMU CS 6030

Masses of Amino Acid Residues

The Periodic Table (part of it!)

Protein Backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei

Peptide Fragmentation Collision Induced Dissociation H+ H...-HN-CH-CO . . .NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 Prefix Fragment Suffix Fragment • Peptides tend to fragment along the backbone. • Fragments can also loose neutral chemical groups like NH3 and H2O.

Breaking Protein into Peptides and Peptides into Fragment Ions • Proteases, e.g. trypsin, break protein into peptides. • A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. • Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. • Mass Spectrometer measure mass/chargeratio of an ion.

N- and C-terminal Peptides P A G N F A P G N F A N P G F C-terminal peptides N-terminal peptides A N F P G P A N F G

Terminal peptides and ion types P G N F Peptide H2O Mass (D) 57 + 97 + 147 + 114 = 415 P G N F Peptide without H2O Mass (D) 57 + 97 + 147 + 114 – 18 = 397

N- and C-terminal Peptides 486 P A G N F A 71 P G N F 415 301 A N P G F 185 C-terminal peptides N-terminal peptides A N F P G 332 154 P A N F G 429 57

N- and C-terminal Peptides 486 71 415 301 185 C-terminal peptides N-terminal peptides 332 154 429 57

Peptide Fragmentation b2-H2O b3- NH3 a2 b2 a3 b3 HO NH3+ | | R1 O R2 O R3 O R4 | || | || | || | H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H y3 y2 y1 y2 - NH3 y3 -H2O

G V D L K L 57 Da = ‘G’ K D V G 99 Da = ‘V’ H2O D Mass Spectra • The peaks in the mass spectrum: • Prefix • Fragments with neutral losses (-H2O, -NH3) • Noise and missing peaks. mass 0 and Suffix Fragments.

G V D L K • Peptide Identification: Intensity MS/MS mass 0 mass 0 Protein Identification with MS/MS

Tandem Mass-Spectrometry

Breaking Proteins into Peptides HPLC GTDIMR To MS/MS PAKID MPSERGTDIMRPAKID...... MPSER …… …… protein peptides

collision cell MS-2 MS-1 Ion Source Tandem Mass Spectrometry MS LC Scan 1707 MS/MS Scan 1708

Tandem Mass Spectrum • Tandem Mass Spectrometry (MS/MS): mainly generates partial N- and C-terminal peptides • Spectrum consists of different ion types because peptides can be broken in several places. • Chemical noise often complicates the spectrum. • Represented in 2-D: mass/charge axis vs. intensity axis

W R V A L Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. T G E P L K C W D T Database of all peptides = 20nAAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI, AVGELTI, AVGELTK , AVGELTL, AVGELTM, YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY W R V A L T G E P L K C W D T De Novo vs. Database Search Database Search De Novo Mass, Score AVGELTK

De Novo vs. Database Search: A Paradox • The database of all peptides is huge ≈ O(20n) . • The database of all known peptides is much smaller ≈ O(108). • However, de novo algorithms can be much faster, even though their search space is much larger! • A database search scans all peptides in the database of all known peptides search space to find best one. • De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.

De novo Peptide Sequencing Sequence

Theoretical Spectrum

Theoretical Spectrum (cont’d)

Two approaches to determining sequence from the mass spectra • Exhaustive Search • Involves analysis of 20L sequences of length L • Branch and bound advisable, but success has been limited • Analysis of Spectrum Graph • Based on experimental spectrum rather than all possible matches to possible spectra TK added slide for WMU CS 6030

Building Spectrum Graph • How to create vertices (from masses) • How to create edges (from mass differences) • How to score paths • How to find best path

Ion Types • Some masses correspond to fragment ions, others are just random noise • Knowing ion typesΔ={δ1, δ2,…, δk} lets us distinguish fragment ions from noise • We can learn ion types δi and their probabilities qi by analyzing a large test sample of annotated spectra.

Example of Ion Type • Δ={δ1, δ2,…, δk} • Ion types {b, b-NH3, b-H2O} correspond to Δ={0, 17, 18} • This is a set of masses of chemical groups frequently cleaved from peptide fragments during bombardment of molecules with electrons in the mass spectrometer collision cell *Note: In reality the δ value of ion type b is -1 but we will “hide” it for the sake of simplicity TK modified slide for WMU CS 6030

Theoretical Spectra • The theoretical spectra of peptide P is designated as T(P) • T(P) can be calculated by subtracting all possible ion types Δ={δ1, δ2,…, δk} from all possible partial peptides of P • Every partial peptide will have k masses in T(P) TK added slide for WMU CS 6030

Match between Spectra and the Shared Peak Count • The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) • In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks • Match between experimental (S) and theoretical spectra (T) is defined similarly

Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: • S: experimental spectrum • Δ: set of possible ion types • m: parent mass Output: • P: peptide with mass m, whose theoretical spectrum T matches the experimental S spectrum the best

Vertices of Spectrum Graph • Masses of potential N-terminal peptides • Vertices are generated by reverse shifts corresponding to ion types Δ={δ1, δ2,…, δk} • Every N-terminal peptide can generate up to k ions m-δ1, m-δ2, …, m-δk • Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ1, s+δ2, …, s+δk} corresponding to potential N-terminal peptides • Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}

Edges of Spectrum Graph • Two vertices with mass difference corresponding to an amino acid A: • Connect with an edge labeled by A

Peptide Sequences • A possible peptide sequence is identified by finding a path from mass 0 to parent mass m in the resulting DAG • Many possible sequences (paths) will typically be identified in the DAG • The “best path” is identified by scoring using probabilities that are based on experimental data TK added slide for WMU CS 6030

Path Score • p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq} • p(P, s) = the probability that peptide P generates a peak (mass) s • Scoring = computing probabilities • p(P,S) = πsєSp(P, s)

Peak Score • For a position t that represents ion type dj : qj, if peak is generated at t p(P,st) = 1-qj , otherwise

Peak Score (cont’d) • For a position t that is not associated with an ion type: qR , if peak is generated at t pR(P,st) = 1-qR , otherwise • qR = the probability of a noisy peak that does not correspond to any ion type

Finding Optimal Paths in the Spectrum Graph • For a given MS/MS spectrum S, find a peptide P’ maximizing p(P,S) over all possible peptides P: • Peptides = paths in the spectrum graph • P’ = the optimal path in the spectrum graph

Ions and Probabilities • Tandem mass spectrometry is characterized by a set of ion types {δ1,δ2,..,δk}and their probabilities {q1,...,qk} • δi-ions of a partial peptide are produced independentlywith probabilities qi

Ions and Probabilities • A peptide has all k peaks with probability • and no peaks with probability • A peptide also produces a ``random noise'' with uniform probability qR in any position.

Ratio Test Scoring for Partial Peptides • Incorporates premiums for observed ions and penalties for missing ions. • Example: for k=4, assume that for a partial peptide P’ we only see ions δ1,δ2,δ4. The score is calculated as:

Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry

Presentation Transcript

Statistical Significance for Peptide Identification by Tandem Mass Spectrometry

Mass Spectrometry for Protein Quantification and Identification of Posttranslational Modifications

Protein Sequencing and Identification by Mass Spectrometry

Identification of Historical Colourants by Mass Spectrometry Volodymyr Pauk

Infrared spectrometry and mass spectrometry...

Mass Spectrometry

Protein Identification and Peptide Sequencing by Liquid Chromatography – Mass Spectrometry

Protein Sequencing and Identification

Using mass spectrometry for protein-protein interaction studies

Proteomics: Protein Profiling and Identification through Mass Spectrometry

Peptide Sequencing by Mass Spectrometry

Identification of Protein Free Radicals Using Mass Spectrometry Leesa J. Deterding

Statistical Significance for Peptide Identification by Tandem Mass Spectrometry

Protein Identification Using Tandem Mass Spectrometry

Mass Spectrometry-Based Methods for Protein Identification

Analysis of Protein Complexes by Mass Spectrometry

Protein sequencing and Mass Spectrometry

Introduction to mass spectrometry-based protein identification and quantification

PROTEIN IDENTIFICATION BY MASS SPECTROMETRY

host cell protein analysis mass spectrometry

Analysis of Protein Complexes by Mass Spectrometry

Peptide Sequencing by Mass Spectrometry