1 / 17

What’s optimal about N choices?

What’s optimal about N choices?. Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks to NSF & NIMH. Neuro-inspired decision-making models*. 1. The two-alternative forced-choice task (2-AFC).

rhys
Download Presentation

What’s optimal about N choices?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks to NSF & NIMH.

  2. Neuro-inspired decision-making models* 1. The two-alternative forced-choice task (2-AFC). Optimal decisions: SPRT, LAM and DDM*. 2. Optimal performance curves. 3. MSPRT: an asymptotically optimal scheme for n > 2 choices (Dragalin et al., 1990-2000) . 4. LAM realizations of n-AFC; mean RT vs ER; Hick’s law. 5. Summary (the maximal order statistics) * Optimality viewpoint: maybe animals can’t do it, but they can’t do better. ** Sequential probability ratio test, leaky accumulator model, drift-diffusion model.

  3. 2-AFC, SPRT, LAM & DDM Choosing between 2 alternatives with noisy incoming data p2(x) p1(x) Set thresholds +Z, -Z and form running product of likelihood ratios: Decide 1 (resp. 2) when Rn first falls below -Z (resp. exceeds +Z). Theorem (Wald, 1947; Barnard, 1946): SPRT is optimal among fixed or variable sample size tests in the sense that, for a given error rate (ER), expected # samples to decide is minimal. (Or, for given # samples, ER is minimal.)

  4. DDM is the continuum limit of SPRT. Let +Z Drift, a -Z Extensive modeling of behavioral data(Stone, Laming, Ratcliff et al., ~1960-2005).

  5. There’s also increasing neural evidence for DDM: FEF: Schall, Stuphorn & Brown, Neuron, 2002. LIP: Gold & Shadlen, Neuron, 2002.

  6. Balanced LAM reduces to DDM on invariant line: (linearized: race model if a = b = 0). Uncouple via stable OU flow in y1 if a, b large, DD in y2 if a = b. Absolute thresholds in (x1, x2) become relative (x2 - x1)! +Z -Z

  7. LAM sample paths collapse towards an attracting invariant manifold. (cf. C. Brody: Machens et al., Science, 2005) t t First passage across threshold determines choice.

  8. Simple expressions for first passage times and ERs: Redn to 2 params: Can compute thresholds that maximize reward rate: (1) (Gold-Shadlen, 2002; Bogacz et al., 2004-5) This leads to …

  9. Optimal performance curves (OPCs): Human behavioral data: the best are optimal, but what about the rest? Bad objective function, or bad learners? Increasing acc. wt. Left: RR defined previously; Right: a family of RR’s weighted for accuracy. Learning not considered here. (Bogacz et al., 2004; Simen, 2005.)

  10. N-AFC: MSPRT & LAM MSPRT chooses among n alternatives by a max vs. next test: MSPRT is asymptotically optimal in the sense that # samples is minimal in the limit of low ERs(Dragalin et al, IEEE trans., 1999-2000). A LAM realization of MSPRT (Usher-McClelland 2001) asymptotically predicts (cf. Usher et al, 2002)

  11. The log(n-1) dependence is similar to Hick’s Law: RT = A + B log n or RT = B log (n+1). W.E. Hick, Q.J. Exp. Psych, 1952. We can provide a theoretical basis and predict explicit SNR and ER dependence in the coefficients A, B.

  12. Multiplicative constants blow up log-ly as ER -> 0. Behavior for small and larger ERs: (2) (2) Empirical formula, generalizes (1),

  13. But a running max vs next test is computationally costly (?). LAM can approximately execute a max vs average test via absolute thresholds. n-unit LAM decoupled by: y1attracted to hyperplaney1 = A, so max vs average becomes an absolute test! DD on hyperplane Attraction is faster for larger n: stable eigenvaluel1 ~ n.

  14. Max vs average is not optimal, but it’s not so bad: Unbalanced LAMs - OU processes absolute max vs average max vs next absolute max vs average max vs next Max vs next and max vs ave coincide for n=2. As n increases, max vs ave deteriorates, approaching absolute test performance. But it’s still better for n < 8-10!

  15. Simple LAM/DD predicts log (n-1), not log n or log (n+1) as in Hick’s law: but a distribution of starting points gives approx log n scaling for 2 < n < 8, and ER and SNR effects may also enter.

  16. The effect of nonlinear activation functions, bounded below, is to shift scaling toward linear in n: The limited dynamic range degrades performance, but can be offset by suitable bias (recentering). Nonlinear LAMs Linearized LAM

  17. Summary: N-AFC • MSPRT max vs next test is asymptotically optimal in low ER limit. • LAM (& race model) can perform max vs next test. • Hick’s law: emerges for max vs next, max vs ave & absolute tests. A, B smallest for max vs next, OK for max vs ave. • LAM executes a max vs average test on its attracting hyperplane using absolute thresholds. • Variable start points give log n scaling for `small n.’ • Nonlinear LAMs degrade performance: RT ~ n for sufficiently small dynamic range. More info: http://mae.princeton.edu/people/e21/holmes/profile.html

More Related