Revisiting an Old Topic: Probability of Replication

Revisiting an Old Topic:Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

Outline • Scientific Background • Our Estimand: Probability of Selection • Estimators • STAR*D • Where to go from here?

Scientific Background First experiment results in • or • , • what is the chance that we will replicate this result in a subsequent experiment? • Prob. of Concurrence or Prob. of Replication • Killeen (2005) followed by great controversy in psychology (Cumming, (2005, 2006, 2008); MacDonald (2005);Doros & Geier(2005); Iverson(2008); Iverson, Wagenmakers & Lee (2008); Asby & O’Brien(2008), Iverson, Lee & Wagenmakers (2009)……)

Scientific Background Similar problem but discredited: Post-hoc power/ Observed power: Assuming the observed standardized effect size is the truth, calculate the probability of rejecting null hypothesis. Hoenig & Heisey (2001) 4

Scientific Background First experiment results in or what is the chance that we will replicate this result in a subsequent experiment? Why is this question so attractive? Scientists (including statisticians!) often want to answer this question with 1 – p-value 5

Scientific Background First experiment results in or , what is the chance that we will replicate this result in a subsequent experiment? 1 – p-value does not address this question. Goodman (1992), Cumming (2008) 1 – p-value is not an estimator. 6

Scientific Background Much confusion about estimand: , what is the chance that we will replicate this result in a subsequent experiment? Do we want to “estimate” or or or ? Good frequentist properties are desired. 7

Our Estimand Probabilities of Selection The probability of selection is a composite measure of signal, noise, and sample size 8

Our Estimand Advantages (The Hope) over the concept of p-value Close to what many scientists want. The intuitive interpretation is correct. Does not rely on the correctness of a data generating model for meaning. Less ambitious than 3) Disadvantages We changed the question. Some may think that there is no need for a confidence interval—wrong. Non-regular 9

Estimators Why is this a hard problem? The desire for good frequentist properties The fact that effect sizes tend to be small relative to the noise. This is a non-regular problem—bias is of the same order as variance. Back of the envelope calculations: 10

Estimators Use plug-in estimator Plug-in estimator is 1 – p-value (Goodman, 1992)! Nonregular Near a uniform distribution if If n is large, close to 0 or 1 otherwise We can expect to be small. 11

Estimators Try a Bayesian approach. Random sample from a , Flat prior on , known Use as an estimator of Bayesian methods do not eliminate non-regularity. 12

Estimators Focus on MSE in formulating estimators for . Assume is approximately normal with mean and variance Flat prior (e.g. Killeen’s prep) Normal Prior: Prior is mixture between N(0,1) with probability w point mass on with probability 1-w 13

Estimators Focus on MSE in formulating estimators for . Single bootstrap (Efron & Tibshirani:1989) . This is 1 - p-value. No assumption of approximate normality. If is approximately normal then this is approximately the plug-in estimator: Double bootstrap This is a bagged plug-in estimator. This bags the 1-bootstrap p-value. No assumption of approximate normality. 14

Why a double bootstrap? Double bootstrap estimator for . Bagging is used to trade variance for bias when estimators are unstable (Buehlman & Yu, 2002). The bootstrap estimator of is unstable; if it does not converge as the sample size increases. Under local alternatives such as the bootstrap estimator is inconsistent as well. 15

Double Bootstrap Double bootstrap estimator for . If has an approximate normal distribution then the double bootstrap estimator is That is, the double bootstrap reduces to prep in this case. 16

MSE Plots Two groups, each of size 25 Two distributions (normal, bimodal) Two definitions of Compare prep, pnorm, pmix, single bootstrap, double bootstrap 17

Estimators Instead of a point estimator, consider a confidence interval for . Assume has an approximate normal distribution; then In this case a confidence interval for can be found from a confidence interval for the standardized effect size: 21

STAR*D Sequenced Treatment Alternatives to Relieve Depression Large multi-site study focused on individuals whose depression did not remit with citalopram In this trial each individual can proceed through up to 4 stages of treatment. The individual moves to a next stage if the individual is not responding to present treatment. Each stage involves a randomization. 22

STAR*D This is a data from 683 individuals who did not respond to citalopram and preferred a switch in treatment. These individuals were randomized between Venlafaxine, Bupropion, Sertraline Outcome: Time until remission. We model the area under the survival curve from entry into this stage of treatment until 30 months. (e.g. min(T, 30)). 23

STAR*D Regression formula at level 2:

STAR*D For each s, Double Bootstrap Inner-most bootstrap counts proportion of “votes” in which Outer-most bootstrap averages over the proportion across the bootstrap samples 25

Discussion Definition of the probability of selection when there is more than two treatments. Confidence intervals for comparisons between more than two treatments. Is there a minimax estimator of the selection probability? Is there hope for the replication probability? 28

Truth in Advertising:STAR*D Missing Data + Study Drop-Out • 1200 subjects begin level 2 (e.g. stage 1) • 42% study dropout during level 2 • 62% study dropout by 30 weeks. • Approximately 13% item missingness for important variables observed after the start of the study but prior to dropout.

This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/HopkinsBiostat09.23.09.ppt Email me with questions or if you would like a copy! samurphy@umich.edu

Our Estimand The probability of selection is a composite measure of signal, noise and sample size The p-value is a composite measure of estimated signal, estimated noise and sample size. 31

Revisiting an Old Topic: Probability of Replication

Revisiting an Old Topic: Probability of Replication

Presentation Transcript

Introduction to probability

Rules of Probability

Probability Assessment

REPLICATION

Chapter 6

Joint Probability Distributions

Discrete Probability

Distributed Systems

Central Dogma

HMM Algorithms

Chapter 5: Probability Distributions: Discrete Probability Distributions

Naval Probability of Program Success (PoPS)

DNA replication

Chapter 28 DNA Metabolism: Replication, Recombination, and Repair

Probability

The Cell Cycle, DNA Replication, and Mitosis

Unit 7 - Probability

AP STATISTICS EXAM REVIEW by DAVID CUSTER (click on topic of choice)

Probability and Discrete Random Variable

CS 347: Parallel and Distributed Data Management Notes07: Data Replication