190 likes | 452 Views
Percentile-Finding and the Sorcerer's Stone? Combining Up-and-Down and Bayesian Designs . Assaf Oron and Peter HoffStatistics Dept., University of Washington, Seattleassaf@u.washington.edu. Philosopher's. Percentile Finding: The Problem. Binary Response Experiments (
E N D
1. Percentile-Finding and the Sorcerers Stone?Combining Up-and-Down and Bayesian Designs Assaf Oron and Peter Hoff
Statistics Dept., University of Washington, Seattle
assaf@u.washington.edu
2. Percentile-Finding and the Sorcerers Stone?Combining Up-and-Down and Bayesian Designs Assaf Oron and Peter Hoff
Statistics Dept., University of Washington, Seattle
assaf@u.washington.edu
3. Percentile Finding: The Problem Binary Response Experiments (yes or no)
Positive response probability increases with increasing treatment (x)
Sensory experiments, toxicity studies, material stress failure studies, etc.
Thresholds assumed to have a (sub-)CDF F(x)
4. Percentile Finding: The Problem Goal: find the treatment that would give a fixed probability p of positive response
i.e., a percentile of F: Qp = F -1(p) , a.k.a. the target
(In this talk we use p=0.3, encountered in Phase I clinical trials)
Constraints:
A fixed discrete set of treatments
Small to moderate sample size (n < 10 to n 100)
5. Two Sequential Sampling Approaches
6. Method Basics Up-and-Down (Dixon and Mood, 1948)
Ubiquitous in Applied Research (psychophysics, engineering, life sciences, medicine,
)
Generates a Markov Chain, stationary distribution p peaked around target (Tsutakawa, 1967)
Bayesian (QUEST, Watson and Pelli, 1983; CRM, OQuigley et al., 1990)
CRM Increasingly popular in Phase I clinical trials
Aims to zoom perfectly onto closest level to target
7. Convergence Comparison
8. Convergence Comparison
9. U&D Convergence Limitations
10. Robustness Comparison
11. Robustness Comparison
12. Sorcerers Stone: Bayesian Quick Gambling Now we reach the magic connection of this talk.
Bayesian designs tend to create the impression they have some magical knowledge of where the target is. And very typically not just in simulation, but also in experiments the design locks onto a single level and gambles on it as the correct one.
What we see in the chart are 7 distribution scenarios never mind their names now and for each of them, how often did the Bayesian design allocate 12 or more of the first 18 trials to the same single level. In blue is how often they got it right; in red, how often they got it wrong. Now what happens when they get it wrong? We wasted a good bit of the experiment gathering information in the wrong place, and now we have to dig ourselves up from the hole.
Why does this happen? Bayesian designs essentially gamble that any unlucky sequence, if it happens, will happen late in the experiment. If the gamble fails and an excursion is observed early, the experiment starts with very poor point estimates of F, which feed into the model and throw it off target. Then it takes quite a while for these estimates to correct themselves.,Now we reach the magic connection of this talk.
Bayesian designs tend to create the impression they have some magical knowledge of where the target is. And very typically not just in simulation, but also in experiments the design locks onto a single level and gambles on it as the correct one.
What we see in the chart are 7 distribution scenarios never mind their names now and for each of them, how often did the Bayesian design allocate 12 or more of the first 18 trials to the same single level. In blue is how often they got it right; in red, how often they got it wrong. Now what happens when they get it wrong? We wasted a good bit of the experiment gathering information in the wrong place, and now we have to dig ourselves up from the hole.
Why does this happen? Bayesian designs essentially gamble that any unlucky sequence, if it happens, will happen late in the experiment. If the gamble fails and an excursion is observed early, the experiment starts with very poor point estimates of F, which feed into the model and throw it off target. Then it takes quite a while for these estimates to correct themselves.,
13. We run an U&D chain, but calculate the Bayesian model at each step:
If the Bayesian allocation is closer to target
with 100(1-ß)% posterior credibility,
we allow Bayesian allocation to override U&D Bayesian Up-and-Down (BUD)
14. BUD Credibility: How it Works (1)
15. BUD Credibility: How it Works (2)
16. BUD: More about ß ß=0.5 is pure Bayesian
(Bayesian override guaranteed, when using median-based posterior allocation)
ß=0 is pure U&D
(Override never happens)
ß=0.15 to 0.25 seem to work reasonably well
Notes:
In toxicity-averse applications, one can use different ß values for up, down moves in order to limit toxic responses (target remains unchanged)
ß is roughly analogous to frequentist Type II error risk
17. BUD Estimation Performance
18. Conclusions: Theres No Sorcerers Stone Either way, this is a small-n, discrete-level, censored sampling of thresholds
Theres a limit on how well we can expect to do; and we are quite at risk of meltdown
Unlucky sequences are common and can be devastating
Current long-memory designs do not address this risk
BUD offers a way to reduce our exposure, while still improving allocation sharpness with time
19. Acknowledgements Original Motivation for Studying U&D
Michael J. Souter, M.D., Harborview, Seattle
Ph.D. Committee at UW
Peter Hoff, Margaret Pepe, Paul Sampson, Barry Storer, Jon Wellner
Discussion and information
Malachi Columb, Nancy Flournoy, Mauro Gasparini, Miguel A. Garcěa-Perez, Mizrak Gezmu, Mario Stylianou
Help with this Talk
Veronica Berrocal, Qunhua Li, Gail Potter
20. References (chronologically ordered) Dixon and Mood: JASA 43 (1948), 109-126
Wetherill et al.: Biometrika 53 (1966), 439-454
Tsutakawa: JASA 62 (1967), 842-856
Watson and Pelli: Percept. Psychopys. 33 (1983), 113-120
OQuigley et al.: Biometrics 46 (1990), 33-48
Goodman et al.: Stat. Med. 14 (1995), 1149-1161
Shen and OQuigley: Biometrika 83 (1996), 395-405
Babb et al.: Stat. Med. 17 (1998), 1103-1120
Stylianou and Flournoy: Biometrics 58 (2002), 171-177
Cheung and Chappel: Biometrics 58 (2002), 671-674
Oron: Ph.D. Dissertation, forthcoming (Fall 2007)