610 likes | 795 Views
MS Identification. Dr. Juan Antonio VIZCAINO PRIDE Group coordinator. PRIDE team, Proteomics Services Group PANDA group European Bioinformatics Institute Hinxton , Cambridge United Kingdom. Overview …. Search engines: peptide identification Protein inference
E N D
MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics Services Group PANDA group European Bioinformatics Institute Hinxton, Cambridge United Kingdom
Overview … • Search engines: peptide identification • Protein inference • De novo and spectral searches • Choosing the right protein sequence DB • You need to learn many things…
It should not be a black box… From: Lilley et al., Proteomics, 2011
MS proteomics: Shot-gun/bottom-up approaches MS/MS analysis P R O T O C O L peptides sequence database proteins fragmentation MS analysis
Peptide Mass Fingerprinting (MS) MS analysis Peptide Mass Fingerprinting (PMF) MW - Each peak in the spectrum represents a peptide (or mixture of peptides) - Information about the Mass and Charge Not very used at present except for Gel Based approaches (in this case the Molecular Weight of the protein is known)
Peptide Mass Fingerprinting (MS) in the web Aldente (Phenyx): http://www.expasy.org/tools/aldente/ ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/ Bupid: http://zlab.bu.edu/Amemee/ Mascot: http://www.matrixscience.com/search_form_select.html MassSearch: http://www.cbrg.ethz.ch/services/MassSearch MS-Fit (Protein Prospector): http://prospector.ucsf.edu/prospector/mshome.htm PepMAPPER:http://www.nwsr.manchester.ac.uk/mapper/ Profound (Prowl): http://prowl.rockefeller.edu/prowl-cgi/profound.exe XProteo: http://xproteo.com:2698/
MS/MS MS analysis Peptide Mass Fingerprinting (PMF) Fragmentation Peptide sequence information (on top of Mass and Charge) MS/MS analysis
Protein database based comparison compare theoretical spectrum experimental spectrum database sequence Sequential comparison: de novo approaches compare experimental spectrum de novo sequence database sequence Spectral comparison compare experimental spectrum experimental spectrum Spectral library Three types of MS/MS identification Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007
MS proteomics: peptide IDs and protein IDs MS/MS spectra proteins
MS proteomics: peptide IDs and protein IDs MS/MS spectra proteins
MS proteomics: peptide IDs and protein IDs UniProt IPI RefSeq sequence database peptides Search engine TDMDNQIVVSDYAQMDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL MS/MS spectra proteins
Search engines UniProt IPI RefSeq sequence database Proteins TDMDNQIVVSDYAQMDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL VDMSLAQR DIVVQETMEDIDK … Peptides Spectra Sequence database matching Experimental Spectra Theoretical Spectra
Search engines Experimental Spectra Theoretical Spectra • How good is the correlation? • Scores are generated by search engines • Usually the best match is kept
Search engines Taken from Nesvizhskii, J Proteomics, 2010
Search engines Taken from Nesvizhskii, J Proteomics, 2010
The most popular algorithms • MASCOT (Matrix Science) • http://www.matrixscience.com • SEQUEST (Scripps, Thermo Fisher Scientific) • http://fields.scripps.edu/sequest • X!Tandem (The Global Proteome Machine Organization) • http://www.thegpm.org/TANDEM • OMSSA (NCBI) • http://pubchem.ncbi.nlm.nih.gov/omssa/
Overall concept of scores and cut-offs Incorrect identifications Threshold score Correct identifications False negatives False positives Adapted from: www.proteomesoftware.com – Wiki pages
Playing with probabilistic cut-off scores higher stringency identifications false positives
SEQUEST • Very well established search engine • Can be used for MS/MS (PFF) identifications • Based on a cross-correlation score (includes experimental peak height) • Published core algorithm (patented, licensed to Thermo Fisher Scientific) • Provides preliminary (Sp) score, rank, cross-correlation score (XCorr), • and score difference between the top tow ranks (deltaCn, Cn) • Thresholding is up to the user, and is commonly done per charge state • Many extensions exist to perform a more automatic validation of results XCorr = deltaCn=
Search engines: Sequest It measures how good the XCorr is relative to the next best match. The XCorr is high if the direct comparison is significantly greater than the background
Search engines: Mascot • Very well established search engine • Can do MS (PMF) and MS/MS (PFF) identifications • Based on the MOWSE score • Unpublished core algorithm (trade secret) • Predicts an a priori threshold score that identifications need to pass • From version 2.2, Mascot allows integrated decoy searches • Provides rank, score, threshold and expectation value per identification • Customizable confidence level for the threshold score
Search engines: Mascot www.matrixscience.com
Search engines: X!Tandem • Open source search engine • Can be used for MS/MS experiments • Based on a hyperscore, than only takes into account b and y ions. • Published core algorithm and it is freely available • Fast and able to handle PTMs in an iterative fashion • Used as an auxiliary search engine by-Score= Sum of intensities of peaks matching B-type or Y-type ions HyperScore=
Search engines: OMSSA • Open source search engine • Can be used for MS/MS experiments • Relies on a Poisson distribution • Published core algorithm and it is freely available • Provides an expectancy score, similar to the BLAST E-value • Very good performance in comparison with the others • Used as an auxiliary search engine
MS proteomics: peptide IDs and protein IDs UniProt IPI RefSeq sequence database peptides Search engine TDMDNQIVVSDYAQMDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL MS/MS spectra So far, we have actually identified peptides, not proteins proteins
MS proteomics: peptide IDs and protein IDs peptides proteins IPI00302927 IPI00025512 IPI00002478 IPI00185600 IPI00014537 IPI00298497 IPI00329236 IPI00002232 TDMDNQIVVSDYAQMDRTW LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL Protein Inference is complex!!
Intermezzo: Protein inference The minimal and maximal explanatory sets peptide a b c d proteins prot X x x prot Y x prot Z x x x { Minimal set Occam The Truth peptide a b c d proteins prot X x x prot Y x prot Z x x x { Maximal set anti-Occam
Intermezzo: Protein inference Slide from J. Cottrell, Matrix Science
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D
Protein inference A B C D Unambiguous peptide
Protein database based comparison compare theoretical spectrum experimental spectrum database sequence Sequential comparison: de novo approaches compare experimental spectrum de novo sequence database sequence Spectral comparison compare experimental spectrum experimental spectrum Spectral library Three types of MS/MS identification Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007
De novo approaches Example of a manual de novo of an MS/MS spectrum No more database necessary to extract a sequence! Algorithms Lutefisk Sherenga PEAKS PepNovo … References Dancik 1999, Taylor 2000 Fernandez-de-Cossio 2000 Ma 2003, Zhang 2004 Frank 2005, Grossmann 2005 …
Protein database based comparison compare theoretical spectrum experimental spectrum database sequence Sequential comparison: de novo approaches compare experimental spectrum de novo sequence database sequence Spectral comparison compare experimental spectrum experimental spectrum Spectral library Three types of MS/MS identification Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007
Spectral searching • Concept: To compare experimental spectra to other experimental spectra. • There are many spectral libraries publicly available (for instance, from NIST) • Custom ‘search engines’ have been developed: • SpectraST (TPP) • X!Hunter (GPM) • It has been claimed that the searches have more sensitivity that with sequence database approaches
Spectral searching (2) http://peptide.nist.gov/
Multi-stage peptide identification strategy Goal: “Squeeze” your good quality experimental spectra Taken from Nesvizhskii, J Proteomics, 2010