450 likes | 861 Views
Improving Sensitivity by Combining Results from Multiple Search Methodologies . Brian C. Searle Proteome Software Inc. Portland, OR Brian.Searle@ProteomeSoftware.com MBI workshop on Computational Proteomics and Mass Spectrometry (January 11-14, 2005) . The Analytical Challenge.
E N D
Improving Sensitivity by Combining Results from Multiple Search Methodologies Brian C. Searle Proteome Software Inc. Portland, OR Brian.Searle@ProteomeSoftware.com MBI workshop on Computational Proteomics and Mass Spectrometry (January 11-14, 2005)
The Analytical Challenge Biological Samples Control Experiments Q-TOF Unknown Spectra IonTrap
The Analytical Challenge • Why can you only interpret half as much MS/MS data in experiments you actually care about? • What is going on with the remaining 90% unidentified spectra?
The OpenSea Approach De Novo Sequence: YD[Cc]DD[220]GADHFTY[200]R OpenSea Alignment: Crystallin, S (CRBS_HUMAN) GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS || | | X X X || | || | | YD(Cc)D(D )([220])(G )AD(HF)TY([200])R
de novo Sequence YD[Cc]DD[220]GADHFTY[200]R 163-115-160-115-115-220-57-71-…
de novo Sequence … YD[Cc]DD[220]GADHFTY[200]R 163-115-160-115-115-220-57-71-… G-57 R-156 R-156 Y-163 D-115 C-160 D-115 C-160 D-115 C-160 A-71 Database Sequence …
de novo Sequence … YD[Cc]DD[220]GADHFTY[200]R 163-115-160-115-115-220-57-71-… G-57 R-156 R-156 Y-163 D-115 C-160 D-115 C-160 D-115 C-160 A-71 Database Sequence …
Auto-Interpretation of OpenSea Results OpenSea Alignment: GRRYD(Cc)D(Cc)( D )(Cc)AD(FH)TY( LS )RCNS || | | X X X || | || | | YD(Cc)D(D )([220])(G )AD(HF)TY([200])R +14 AMU on either cysteine or -43 AMU on aspartic acid… Modification lookup table suggests methylation of cysteine! Auto-Interpretation: GRRYD(Cc)D( CmDCc )AD(FH)TY( LS )RCNS || | | : || | || | | YD(Cc)D(D[220]G)AD(HF)TY([200])R
Spectrum Identification Overlap Between Search Methods SEQUEST 6% 17% 7% 41% X!Tandem 10% 10% OpenSea PTMs polymorphisms 9%
Spectrum Identification Overlap Between Search Methods SEQUEST neutral losses 6% 17% 7% 41% X!Tandem semi-tryptic no ladder 10% 10% OpenSea 9%
Scaffold Data Compiler • Combine SEQUEST, Mascot, X!Tandem, and OpenSea results • Utilize spectrum clustering and noise filters to remove uninteresting spectra • Export interesting, unidentified spectra for further analysis Search Wider Drill Deeper Remove Junk Focus Efforts Combine Database Searching IDs Cluster Spectra to Previously IDs Report Interesting, Unidentified Spectra Filter Electronic Noise For All Spectra
Combining SEQUEST and X!Tandem Scores X!Tandem –log(E-Value) Score SEQUEST Descriminant Score (Peptide Prophet, ISB)
Combining SEQUEST and X!Tandem Scores X!Tandem –log(E-Value) Score SEQUEST Descriminant Score (Peptide Prophet, ISB)
Peptide Prophet (ISB) Incorrect IDs p=50% Correct IDs
Protein Prophet (ISB) Protein 1 Protein 7 Peptide 1 Protein 4 Peptide 2 Peptide 3 Protein 2 Protein 8 Peptide 4 Protein 5 Peptide 5 Protein 3 Peptide 6 Protein 6 Peptide 7
Protein Prophet (ISB) Protein 1 Protein 7 Peptide 1 Protein 4 Peptide 2 Peptide 3 Protein 2 Protein 8 Peptide 4 Protein 5 Peptide 5 Protein 3 Peptide 6 Protein 6 Peptide 7
Incorrect IDs p(NSP|-) Correct IDs p(NSP|+) Normalized Distribution For each spectrum… IDs with: high NSP--p Low NSP--p NSP Bin Number Log p(NSP|+)/p(NSP|-) Correct IDs have higher NSP Values
Peptide Prophet Protein Prophet Get SEQUEST IDs Calculate SEQUEST Probability Get Mascot IDs Calculate Mascot Probability Calculate Combined Peptide Probability For Each Spectrum Calculate Protein Probabilities Get X!Tandem IDs Calculate X!Tandem Probability Scaffold Merge Prophet Get OpenSea IDs Calculate OpenSea Probability …
Peptide 1 Get SEQUEST Identification p=85% p=76% Get Mascot Identification Peptide 2 For Each Spectrum Get X!Tandem Identification p=54% Peptide 3 Get OpenSea Identification
Peptide 1 Get SEQUEST Identification Peptide 4 p=27% Get Mascot Identification Peptide 2 p=81% For Each Spectrum Peptide 5 Get X!Tandem Identification p=35% Peptide 3 Get OpenSea Identification
Peptide 1 Peptide 7 Get SEQUEST Identification Peptide 4 Get Mascot Identification Peptide 2 Peptide 8 For Each Spectrum Peptide 5 Get X!Tandem Identification Peptide 3 Peptide 6 Get OpenSea Identification
Protein Prophet’s NSP value (number of sibling peptides) becomes… Merge Prophet’s number of sibling programs
Incorrect IDs p(NSP|-) Correct IDs p(NSP|+) Normalized Distribution For each spectrum… IDs with: high NSP--p Low NSP--p NSP Bin Number Log p(NSP|+)/p(NSP|-) Correct IDs have higher NSP Values
Accuracy of the Probability Combining Model Mascot X!Tandem Calculated Probability Combination SEQUEST Actual Probability
Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By SEQUEST (40%) Unknown Spectra (60%)
Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (60%) Unknown Spectra (40%)
Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (73%) Unknown Spectra (27%)
#1 #2
#1 #2
#1 #2
#1 #2
#2 #3
Protein Prophet Find Spectra Similar to Previously Identified Report Interesting, Unidentified Spectra Calculate Combined Probability Calculate Protein Probabilities Filter Electronic Noise Scaffold Merge Prophet Scaffold Cluster Prophet
Cluster Prophet Principle If an unidentified spectrum is 95% similar to a correctly identified spectrum… it is also considered to be identified.
Rank-Based Cluster Similarity Score Incorrect IDs p=50% Correct IDs
MS/MS Spectrum Filter • Dynamic range filter removes spectra from peptides with poor/no fragmentation • Signal to noise filter removes electronic noise
Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (73%) Unknown Spectra (27%)
Percentage of QTOF Spectra Correctly Identified as Control Proteins Identified By Scaffold (74%) Unknown Spectra (5%) Not Interesting (21%)
Percentage of 2D-LC QTOF Spectra Correctly Identified as Lens Proteins Identified By Scaffold (48%) Unknown Spectra (21%) Not Interesting (31%)
The Analytical Challenge Biological Samples Control Experiments IDed by SEQUEST IDed by SEQUEST Q-TOF Unknown Spectra Unknown Spectra IDed by SEQUEST IDed by SEQUEST IonTrap Unknown Spectra Unknown Spectra
The Analytical Challenge Biological Samples Control Experiments IDed by Scaffold IDed by Scaffold Q-TOF Unknown Spectra Unknown Spectra 85% more IDs 95% comprehension 336% more IDs 79% comprehension IDed by Scaffold IDed by Scaffold IonTrap Unknown Spectra Unknown Spectra 48% more IDs 65% comprehension 227% more IDs 75% comprehension
Conclusions • Using Scaffold technologies, you can drill deeper and search wider using multiple database searching approaches and MS/MS spectrum clustering • Scaffold and implementations of Peptide/Protein Prophet were written in platform-independent Java • Scaffold will be available at ASMS 2005
OpenSea Team (OHSU) Srinivasa Nagalla Surendra Dasari Ashok Reddy Larry David Phil Wilmarth Ashley McCormack Contact: nagallas@ohsu.edu Scaffold Team (Proteome Software Inc.) Mark Turner James Brundege Contact: Brian.Searle@ ProteomeSoftware.com Acknowledgements