Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+

Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+ Brian C. Searle and Mark Turner Proteome Software Inc. Vancouver Canada, ASMS 2012 Creative Commons Attribution

Reference 114 115 116 117

Ref Ref 114 114 115 115 116 116 117 117 ANOVA Oberg et al 2008 (doi:10.1021/pr700734f)

“High Quality” Data • Virtually no missing data • Symmetric distribution • High Kurtosis

“Normal Quality” Data • High Skew due to truncation • >20% of intensities are missing in this channel! • Either ignore channels with any missing data (0.84 = 41%) …

“Normal Quality” Data …Or deal with a very non-Gaussian data!

Contents • A Simple, Non-parametric Normalization Model • Refinement 1: Intelligent Intensity Weighting • Refinement 2: Standard Deviation Estimation • Refinement 3: Kernel Density Estimation • Refinement 4: Permutation Testing

Simple, Non-parametric Normalization Model

Additive Effects on Log Scale • Experiment: sample handling effects across MS acquisitions (LC and MS variation, calibration etc) • Sample: sample handling effects between channels (pipetting errors, etc) • Peptide: ionization effects • Error: variation due to imprecise measurements Oberg et al 2008 (doi:10.1021/pr700734f)

Additive Effects on Log Scale

Median Polish “Non-Parametric ANOVA” Remove Inter-Experiment Effects Remove Intra-Sample Effects 3x Remove Peptide Effects

Refinement 1: Intensity Weighting

Linear Intensity Weighting Low Intensity, Low Weight High Intensity, High Weight

Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight

Variance At Different Intensities

Estimate Confidence from Protein Deviation

Estimate Confidence from Protein Deviation • Pij = 2 * cumulative t-distribution(tij), where i = raw intensity bin j = each spectrum in bin i = protein median for spectrum j tij = • Pi =

Data Dependent Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight

Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight

Data Dependent Intensity Weighting Most Data, High Weight Low Intensity, Low Weight

Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Data Dependent Intensity Weighting 3x Remove Peptide Effects

Refinement 2: Standard Deviation Estimation

Standard Deviation Estimation i = intensity bin j = each spectrum in bin i = protein median for spectrum j

Data Dependent Standard Deviation Estimation

Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Data Dependent Intensity Weighting 3x Remove Peptide Effects Data Dependent Standard Dev Estimation

Refinement 3: Kernel Density Estimation

Protein Variance Estimation

Kernels

Kernel Density Estimation

Kernel Density Estimation 0.3 shift on Log2 Scale Deviation that shifts distribution

Improved Kernels • We have a better estimate for Pi: the intensity-based weight! • We have a better estimate for Stdevi: the intensity-based standard deviation!

Improved Kernels

Improved Kernel Density Estimation

Improved Kernel Density Estimation Significant Deviation Worth Investigating Unimportant Deviation

Improved Kernel Density Estimation 1.0 shift on Log2 Scale = 2 Fold Change

Refinement 4: Permutation Testing

Why Use Permutation Testing? • Why go through all this work to just use a t-test or ANOVA? • Ranked-based Mann-Whitney and Kruskal-Wallis tests “work”, but lack power

Basic Permutation Test T=4.84

Basic Permutation Test T=4.84 T=1.49

Basic Permutation Test x1000 T=4.84 T=1.49 T=1.34 T=1.14

Basic Permutation Test 950 below 50 above

Improvements… • N is frequently very small • Instead of randomizing N points, randomly select N points from Kernel Densities • Expensive! What if you want more precision?

Extrapolating Precision 1000 below 0 above Actual T-Statistic of 6.6? Last Usable Permutation

Extrapolating Precision Actual T-Statistic of 6.6? Knijnenburg, et al 2011 (doi:10.1186/1471-2105-12-411)

Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+