1 / 54

Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+

Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+. Brian C. Searle and Mark Turner Proteome Software Inc. Vancouver Canada, ASMS 2012. Creative Commons Attribution. Reference. 114. 115. 116. 117. Ref. Ref. 114. 114. 115. 115. 116. 116. 117.

duke
Download Presentation

Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+ Brian C. Searle and Mark Turner Proteome Software Inc. Vancouver Canada, ASMS 2012 Creative Commons Attribution

  2. Reference 114 115 116 117

  3. Ref Ref 114 114 115 115 116 116 117 117 ANOVA Oberg et al 2008 (doi:10.1021/pr700734f)

  4. “High Quality” Data • Virtually no missing data • Symmetric distribution • High Kurtosis

  5. “Normal Quality” Data • High Skew due to truncation • >20% of intensities are missing in this channel! • Either ignore channels with any missing data (0.84 = 41%) …

  6. “Normal Quality” Data …Or deal with a very non-Gaussian data!

  7. Contents • A Simple, Non-parametric Normalization Model • Refinement 1: Intelligent Intensity Weighting • Refinement 2: Standard Deviation Estimation • Refinement 3: Kernel Density Estimation • Refinement 4: Permutation Testing

  8. Simple, Non-parametric Normalization Model

  9. Additive Effects on Log Scale • Experiment: sample handling effects across MS acquisitions (LC and MS variation, calibration etc) • Sample: sample handling effects between channels (pipetting errors, etc) • Peptide: ionization effects • Error: variation due to imprecise measurements Oberg et al 2008 (doi:10.1021/pr700734f)

  10. Additive Effects on Log Scale

  11. Median Polish “Non-Parametric ANOVA” Remove Inter-Experiment Effects Remove Intra-Sample Effects 3x Remove Peptide Effects

  12. Refinement 1: Intensity Weighting

  13. Linear Intensity Weighting Low Intensity, Low Weight High Intensity, High Weight

  14. Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight

  15. Variance At Different Intensities

  16. Estimate Confidence from Protein Deviation

  17. Estimate Confidence from Protein Deviation • Pij = 2 * cumulative t-distribution(tij), where i = raw intensity bin j = each spectrum in bin i = protein median for spectrum j tij = • Pi =

  18. Data Dependent Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight

  19. Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight

  20. Data Dependent Intensity Weighting Most Data, High Weight Low Intensity, Low Weight

  21. Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Data Dependent Intensity Weighting 3x Remove Peptide Effects

  22. Refinement 2: Standard Deviation Estimation

  23. Standard Deviation Estimation i = intensity bin j = each spectrum in bin i = protein median for spectrum j

  24. Data Dependent Standard Deviation Estimation

  25. Data Dependent Standard Deviation Estimation

  26. Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Data Dependent Intensity Weighting 3x Remove Peptide Effects Data Dependent Standard Dev Estimation

  27. Refinement 3: Kernel Density Estimation

  28. Protein Variance Estimation

  29. Protein Variance Estimation

  30. Kernels

  31. Kernels

  32. Kernels

  33. Kernel Density Estimation

  34. Kernel Density Estimation

  35. Kernel Density Estimation 0.3 shift on Log2 Scale Deviation that shifts distribution

  36. Improved Kernels • We have a better estimate for Pi: the intensity-based weight! • We have a better estimate for Stdevi: the intensity-based standard deviation!

  37. Improved Kernels

  38. Improved Kernel Density Estimation

  39. Improved Kernel Density Estimation

  40. Improved Kernel Density Estimation Significant Deviation Worth Investigating Unimportant Deviation

  41. Improved Kernel Density Estimation 1.0 shift on Log2 Scale = 2 Fold Change

  42. Refinement 4: Permutation Testing

  43. Why Use Permutation Testing? • Why go through all this work to just use a t-test or ANOVA? • Ranked-based Mann-Whitney and Kruskal-Wallis tests “work”, but lack power

  44. Basic Permutation Test T=4.84

  45. Basic Permutation Test T=4.84 T=1.49

  46. Basic Permutation Test x1000 T=4.84 T=1.49 T=1.34 T=1.14

  47. Basic Permutation Test 950 below 50 above

  48. Improvements… • N is frequently very small • Instead of randomizing N points, randomly select N points from Kernel Densities • Expensive! What if you want more precision?

  49. Extrapolating Precision 1000 below 0 above Actual T-Statistic of 6.6? Last Usable Permutation

  50. Extrapolating Precision Actual T-Statistic of 6.6? Knijnenburg, et al 2011 (doi:10.1186/1471-2105-12-411)

More Related