Non -specific filtering and control of false positives: an update

Non-specific filtering and control offalse positives: an update Richard Bourgon 16 March 2009 bourgon@ebi.ac.uk

Experiment-wide type I error rates • Family-wise error rate:P(V > 0), i.e., the probability of one or more false positives. For large m0, this is very difficult to keep small. • False discovery rate (FDR): let Q = V/R, or 0 if R is 0. The FDR is E(Q), i.e., the expected fraction of false positives among all discoveries.

A nice property of CDFs for continuous RVs > X = rnorm(100000) > F = pnorm > hist(X, breaks = 50) > hist(F(X), breaks = 50)

A “nice” property? • To compute a p-value for testing a null hypothesis H0, we typically… • Define a test statistic T, and compute its value t for the observed data. • Assume we know the distribution of T when H0 is true: F0. • Compute p = 1 – F0(t), i.e., define p = P(T > t | H0 is true). • Compare p to some α. • Now define the random variable P = 1 – F0(T). If H0 is true, then… • F0(T) is uniformly distribution on [0,1]. • By symmetry, P is uniformly distribution on [0,1] as well. • Suppose 20% of genes are differentially expressed, so that

Observed p-values: a mixture

Non-specific filtering • For a given gene, write the data as ((c1,Y1),…,(cp,Yp)). • First group (c = 1): i = 1, …, p1. • First group (c = 2): i = p1 + 1, …, p1 + p2. • Conditions under which we expect little variation in Y: • Genes which are absent in both samples. (Probes will still report noise and cross-hybridization, typically at the same level in both groups.) • Probe sets which do not respond to target. • Genes which are not differentially expressed. • A “non-specific” filter: • Ignores c1, …, cp, i.e., f(Y). • Helps identify any of these three classes, based on our a priori understanding of array behavior. • Apply standard testing to genes passing the filter, using some g(c,Y).

Increased detection rate • Stage 1 non-specific filter statistic: compute and remove the θ smallest. • Stage 2: standard two-sample t-test for genes passing stage 1.

Increased power? • An increased detection rate implies increased power only if we are still controlling type I errors at the nominal level.

Result: independence of stage 1 and stage 2 test statistics • For genes for which the null hypotheses is true, f(Y) and g(c, Y) are statistically independent in both of the following cases: • For normally distributed data: • Stage 1: overall variance, • Stage 2: the standard two-sample t-statistic. • Non-parametrically: • Stage 1: any function of the data which doesn’t depend on the order of the arguments. S2 above, or the IQR, are both candidates. • Stage 2: the Wilcoxon rank sum test statistic. • Both can be extended to the multi-class context: ANOVA and Kruskal Wallis. • Bonferroni and Holm go through easily — in expectation.

Independence: Benjamini & Hochberg and Storey FDR adjustments • What is the FDR associated with use of cutoff α? Naive estimator: • V is not observable, but E(V) is m0α, bounded by mα. • E(R) cannot be computed, but R can be used as an estimator. • Evaluating at each p(i) using morgives BH95 or Storey adjustments, respectively:

Independence: Benjamini & Hochberg and Storey FDR adjustments The foregoing motivation for the BH95 and Storey procedures uses E(V(α)) = m0α. Marginal independence of true null f(Y) and g(c,Y) means that this still applies at stage 2 in expectation. Define M0 to be the random number of true nulls passing stage 1. Then

For true nulls, we show independence between P and f(Y) over repeated data realizations. The P within a single realization may be correlated. • FDR control is on average only: no guarantees for a single realization Repeated data realizations Genes: stage I and stage II statistics

Correlation and a single data instance • Given pervasive correlation (here, all pairs at +ρ), the empirical distribution of p-values for a single data instance can vary widely. Most extreme distributions in 1000 trials

FWER: Westfall and Young • Westfall and Young (1993) controls FWER with more power, but depends on the joint distribution of all p-values: • WY93 is valid under subset pivotality. If this holds for the one-stage procedure, it holds for the two-stage non-specific filtering approach as well. • Distribution of min Pj under is typically estimated by permutation. If filtering changes correlation structure, new structure is used by permutation!

Correlation and FDR control Storey et al. q-values. Correlation: all pairs at +ρ. Some anti-conservative bias in FDR estimation. oFDR substantially greater than nominal for a small fraction of data instances. BH more conservative, since fixed at 1.

Conclusions • In actual examples, non-specific filtering leads to (biologically) significant increases in the number of genes identified. • Commonly used stage 1/stage 2 test statistic pairs are statistically independent for genes which are not differentially express • Given this independence, Bonferroni and Holm FWER control is valid in the two-stage procedure. • Correlation structure may change under filtering. • Permutation-based Westfall and Young correction accounts for this. FDR control, however, may suffer. • Effect of filtering on correlation can be checked, and impact, assessed.

Non -specific filtering and control of false positives: an update

Non -specific filtering and control of false positives: an update

Presentation Transcript

YPA/ADM WEBINAR

Professional Crop Advisor’s Weed Control Update Scott Hagood – Virginia Tech January 7, 2010 Verona, Virginia

CTG – INTERPRET WITH CARE

Export Controls: Overview and Update

ECG Filtering

Theory of CONCERTIVE CONTROL

Control of gene expression

WASC Update

NGS Data Processing

TASER ® Electronic Control Devices (ECDs) -- Force Update

CHAPTER 19

Infection Control Update

Firewalls and Intrusion Detection Systems

ACCESS CONTROL

Genetics

ICT Technician’s Update Conference

The Basic Theory of Filtering

Image Filtering in the Spatial Domain

Recursive Bayes Filtering Advanced AI

18 USC 1117 MULTIPLE CONSPIRACIES TO MURDER CRIMES 31 USC 3729 FALSE CLAIMS ACT

SCW UNIT SPECIFIC

TEAM COMPETITION