1 / 25

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem. Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006. Outline. Problem statement & motivations Modeling payoff distributions An asymptotically optimal algorithm. D 1. D 2. Machine 1.

quasar
Download Presentation

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006

  2. Outline • Problem statement & motivations • Modeling payoff distributions • An asymptotically optimal algorithm

  3. D1 D2 Machine 1 D3 Machine 2 Machine 3 The k-Armed Bandit • You are in a room with k slot machines • Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution Di • Allowed n total pulls • Goal: maximize total payoff • > 50 years of papers

  4. The Max k-Armed Bandit D1 • You are in a room with k slot machines • Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution Di • Allowed n total pulls • Goal: maximize highest payoff • Introduced ~2003 D2 Machine 1 D3 Machine 2 Machine 3

  5. The Max k-Armed Bandit: Motivations D1 • Given: some optimization problem, k randomized heuristics • Each time you run a heuristic, get a solution with a certain quality • Allowed n runs • Goal: maximize quality of best solution • Cicirello & Smith (2005) show competitive performance on RCPSP Assumption: each run has the same computational cost D2 Tabu Search D3 Hill Climbing Simulated Annealing

  6. The Max k-Armed Bandit: Example • Given n pulls, what strategy maximizes the (expected) maximum payoff? • If n=1, should pull arm 1 (higher mean) • If n=1000, should pull arm 2 (higher variance)

  7. Modeling Payoff Distributions

  8. Can’t Handle Arbitrary Payoff Distributions • Needle in the haystack: can’t distinguish arms until you get payoff > 0, at which point highest payoff can’t be improved

  9. Why? Extremal Types Theorem: max. of n independent draws from some fixed distribution a GEV converges in distribution converges in distribution • Compare to Central Limit Theorem: sum of n draws a Gaussian Assumption • We will assume each machine returns payoff from a generalized extreme value (GEV) distribution

  10. The GEV distribution • Z has a GEV distribution if for constants s, , and  > 0.  determines mean  determines standard deviation s determines shape

  11. Example payoff distribution: Job Shop Scheduling • Job shop scheduling: assign start times to operations, subject to constraints. • Length of schedule = latest completion time of any operation • Goal: find a schedule with minimum length • Many heuristics (branch and bound, simulated annealing...)

  12. Example payoff distribution: Job Shop Scheduling • “ft10” is a notorious instance of the job shop scheduling problem • Heuristic h: do hill-climbing 500 times • Ran h 1000 times on ft10; fit GEV to payoff data

  13. Distribution truncated at 931. Optimal schedule length = 930 (Carlier & Pinson, 1986) probability E[Max. payoff] -(schedule length) num. runs Example payoff distribution: Job shop scheduling Best of 50,000 sampled schedules has length 1014

  14. An Asymptotically Optimal Algorithm

  15. Notation • mi(t) = expected maximum payoff you get from pulling the ith arm t times • m*(t) = max1ik mi(t) • S(t) = expected maximum payoff you get by following strategy S for t pulls

  16. The Algorithm • Strategy S* ( and  to be determined): • For i from 1 to k: • Using D pulls, estimate mi(n). Pick D so that with probability 1-, estimate is within  of true mi(n). • For remaining n-kD pulls: • Pull arm with max. estimated mi(n) • Guarantee: S*(n) = m*(n) - o(1).

  17. The GEV distribution • Z has a GEV distribution if for constants s, , and  > 0.  determines mean  determines standard deviation s determines shape

  18. s>0 Lots of algebra s=0 Not so bad s<0 Behavior of the GEV

  19. Empirical mi(1) Predicted mi(n) Empirical mi(2) Predicting mi(n) • Estimation procedure: linear interpolation! • Estimate mi(1) and mi(2) , then interpolate to get mi(n)

  20. Predicting mi(n): Lemma • Let X be a random variable with (unknown) mean  and standard deviation  max.O(-2 log -1) samples of X suffice to obtain an estimate  such that with probability at least 1-, estimate is within of true value. • Proof idea: use “median of means”

  21. Empirical mi(1) Predicted mi(n) Empirical mi(2) Predicting mi(n) • Equation for line: mi(n) = mi(1)+[mi(1)-mi(2)](log n) • Estimating mi(n) requires O((log n)2 -2 log -1) pulls

  22. The Algorithm • Strategy S* ( and  to be determined): • For i from 1 to k: • Using D pulls, estimate mi(n). Pick D so that with probability 1-, estimate is within  of true mi(n). • For remaining n-kD pulls: • Pull arm with max. predicted mi(n) • Guarantee: S*(n) = m*(n) - o(1) • Three things make S* less than optimal: •  •  • m*(n) - m*(n-kD)

  23. Analysis • Three things make S* less than optimal: •  •  • m*(n) - m*(n-kD) • Setting =n-2, =n-1/3 takes care of the first two. Then: • m*(n)-m*(n-kD) = O(log n - log(n-kD)) = O(kD/n) = O(k(log n)2 -2(log -1)/n) = O(k(log n)3 n-1/3) = o(1)

  24. Summary & Future Work • Defined max k-armed bandit problem and discussed applications to heuristic search • Presented an asymptotically optimal algorithm for GEV payoff distributions (we analyzed special case s=0) • Working on applications to scheduling problems

  25. The Extremal Types Theorem • Define Mn = max. of n draws, and suppose where each rn is a linear “rescaling function”. Then G is either a point mass or a “generalized extreme value distribution”: for constants s, , and  > 0.

More Related