210 likes | 404 Views
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling*. Chongle Pan 1,2 ; David L. Tabb 1 ; Dale Pelletier 1 ; W. Hayes McDonald 1 ; Greg Hurst 1 ; Nagiza F. Samatova 1 ; Robert L. Hettich 1 ; 1 Oak Ridge National Laboratory, Oak Ridge, TN
E N D
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Chongle Pan1,2; David L. Tabb1; Dale Pelletier1; W. Hayes McDonald1; Greg Hurst1; Nagiza F. Samatova1; Robert L. Hettich1; 1Oak Ridge National Laboratory, Oak Ridge, TN 2 Genome Science and Technology, UT-ORNL * Research support provided by the U.S. Department of Energy, Office of Biological and Environmental Research.
RelEx1, ASAPratio2, XPRESS3 , MSQuan4 1Anal Chem, 2003. 75: p. 6912-21 2Anal Chem, 2003. 75: p. 6648-57 3Nat Biotechnol, 2001. 19: p. 946-51 4Nat Biotechnol, 2004. 22: p. 1139-45. Uncertainty in the Measurements Mass spectrometric measurement of a protein Mr = 23,564 Da ±10 Da 95% confidence Relative quantification of a protein in quantitative proteomics Abundance ratio = 1:1 95% confidence interval = [2:1, 1:2] The principal aim
Experimental • Metabolic labeling of Rhodopseudomonas palustris withthestable isotope 15N • Standard mixtures of natural and 15N-labeled proteomes at the pre-determined mixing ratios • Shotgun proteomics analysis • MS instrument: linear ion trap (LTQ, Finnigan) • 2D-LC method: 24-hour MudPIT technique5 • Protein identification • Database searching: DBDigger6 • Identification filtering: DTASelect7 5 Int. J. of Mass Spec. 2002. 219: p. 245-251. 6 Anal Chem, 2003. 75: p. 6912-21 7 J. Proteome Res. 2002 1: p. 21-26.
Benchmark Data Peptide I.D. filtering: 95% of true positive rate Protein I.D. filtering: minimum of 2 peptides Data quality Reproducibility
Block Diagram MS1 or mzXML format mass spectral data SIC reconstruction selected ion chromatogram peak detection chromatographic peak parallel paired covariance peptide abundance ratio confidence score peptide quantification principal component analysis protein abundance ratio confidence interval protein quantification maximum likelihood estimation
Peak Detection Parallel paired covariance chromatogram (PPC) Light isotopologue SIC; Heavy isotopologueSIC ion intensity S/N=3; S/N=13 S/N=42 covariance Peak boundaries scan number scan number Peak boundaries are defined as the local minima in the PPC, which include all MS/MS matching the peptide
Peak area ratio ASAPratio, MSQuan, XPRESS Peptide Quantification Peptide abundance ratios can be estimated by • Peak height ratio ion intensity ion intensity scan number scan number
Linear regressionRelEx Principal component analysis (PCA) ratio = tan(θ) PC1 signal-to-noise ratio = PCA-SNR θ θ PC2 Peptide Quantification ion intensity light isotopologue ion intensity heavy isotopologue ion intensity scan number
Quantification Accuracy Expectedlog2(ratio) 1:5 Peptide counts Peak height ratio Peak area ratio PCA/linear regression log2(ratio)
Quantification Accuracy 1:10 1:1 1:5 1:10 Peptide counts 10:1 10:1 1:1 5:1 5:1 Peptide counts log2(ratio) log2(ratio) log2(ratio)
Quantification Confidence 2D histogram of peptide log2(ratio) & log2(PCA-SNR) 5:1 log2(PCA-SNR) peptide counts log2(ratio)
Quantification Confidence Bin the peptides by their log2(PCA-SNR) value Bias: the deviation of the average estimated log2(ratio) from the expected log2(ratio) Bias increases as PCA-SNR decreases below a threshold 5:1 log2(PCA-SNR) log2(ratio)
Quantification Confidence Bin the peptides by their log2(PCA-SNR) value Variance: the variability of the estimated log2(ratio) Variance increases as PCA-SNR decreases 5:1 log2(PCA-SNR) log2(ratio)
Quantification Confidence 1:10 1:5 1:1 log2(PCA-SNR) log2(S/N) log2(ratio) 5:1 10:1 1:1 Comet-like two-dimensional distribution As log2(SNR) decreases, • the spread of log2(ratio) estimates increases • the average of log2(ratio) estimates regresses to zero log2(PCA-SNR) log2(ratio) log2(ratio) log2(ratio)
Quantification Confidence 1:1 5:1&1:5 10:1&1:10 1:1 5:1&1:5 10:1&1:10 log2(PCA-SNR) log2(PCA-SNR) standard deviation { log2(ratio) } | mean { log2(ratio) } | The quantification bias and variance for peptides are linear functions of PCA-SNR
Protein Quantification mean 2 sd A series of theoretical probability distributions of peptide abundance ratio estimates at each PCA-SNR level measured peptides log2(PCA-SNR) log2(ratio) Maximum likelihood point estimate of a protein’s abundance ratio is the ratio that best explains its measured peptides’ estimated log2(ratio) at the calculated log2(PCA-SNR)
Quantification Accuracy MSE: Mean Square Error protein counts PRATIO filtering: > 2 PCA-SNR > 2 peptides < 4 95% confidence interval width for log2(ratio) RelEx filtering: > 0.7 correlation at 1 > 0.4 correlation at 10 > 3 signal-to-noise > 2 peptides log2(ratio)
Quantification Accuracy protein counts 1:1 1:10 1:5 protein counts 5:1 1:1 5:1 10:1 1:10 log2(ratio) log2(ratio) log2(ratio) RelEx: red; PRATIO: blue
Confidence Interval Estimation 1:5 Display of the point estimates (+) and the 95% confidence interval estimates ( ----------- ) for protein abundance ratios Protein log2(ratio)
Confidence Interval Estimation 1:10 1:1 1:5 Point estimates and confidence interval estimates of protein abundance ratios 1:1 5:1 10:1 log2(ratio) log2(ratio) log2(ratio)
Conclusions Three novel algorithms • Parallel paired covariance for peak detection • Principal component analysis for peptide quantification • Maximum likelihood estimation for protein quantification Improved Protein Quantification Accuracy Rigorous Confidence Interval Estimation The fully automated program with graphic user interface is freely available for testing by contacting C. Pan (email: panc@ornl.gov)