An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006

Outline • Problem statement & motivations • Modeling payoff distributions • An asymptotically optimal algorithm

D1 D2 Machine 1 D3 Machine 2 Machine 3 The k-Armed Bandit • You are in a room with k slot machines • Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution Di • Allowed n total pulls • Goal: maximize total payoff • > 50 years of papers

The Max k-Armed Bandit D1 • You are in a room with k slot machines • Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution Di • Allowed n total pulls • Goal: maximize highest payoff • Introduced ~2003 D2 Machine 1 D3 Machine 2 Machine 3

The Max k-Armed Bandit: Motivations D1 • Given: some optimization problem, k randomized heuristics • Each time you run a heuristic, get a solution with a certain quality • Allowed n runs • Goal: maximize quality of best solution • Cicirello & Smith (2005) show competitive performance on RCPSP Assumption: each run has the same computational cost D2 Tabu Search D3 Hill Climbing Simulated Annealing

The Max k-Armed Bandit: Example • Given n pulls, what strategy maximizes the (expected) maximum payoff? • If n=1, should pull arm 1 (higher mean) • If n=1000, should pull arm 2 (higher variance)

Modeling Payoff Distributions

Can’t Handle Arbitrary Payoff Distributions • Needle in the haystack: can’t distinguish arms until you get payoff > 0, at which point highest payoff can’t be improved

Why? Extremal Types Theorem: max. of n independent draws from some fixed distribution a GEV converges in distribution converges in distribution • Compare to Central Limit Theorem: sum of n draws a Gaussian Assumption • We will assume each machine returns payoff from a generalized extreme value (GEV) distribution

The GEV distribution • Z has a GEV distribution if for constants s, , and  > 0.  determines mean  determines standard deviation s determines shape

Example payoff distribution: Job Shop Scheduling • Job shop scheduling: assign start times to operations, subject to constraints. • Length of schedule = latest completion time of any operation • Goal: find a schedule with minimum length • Many heuristics (branch and bound, simulated annealing...)

Example payoff distribution: Job Shop Scheduling • “ft10” is a notorious instance of the job shop scheduling problem • Heuristic h: do hill-climbing 500 times • Ran h 1000 times on ft10; fit GEV to payoff data

Distribution truncated at 931. Optimal schedule length = 930 (Carlier & Pinson, 1986) probability E[Max. payoff] -(schedule length) num. runs Example payoff distribution: Job shop scheduling Best of 50,000 sampled schedules has length 1014

An Asymptotically Optimal Algorithm

Notation • mi(t) = expected maximum payoff you get from pulling the ith arm t times • m*(t) = max1ik mi(t) • S(t) = expected maximum payoff you get by following strategy S for t pulls

The Algorithm • Strategy S* ( and  to be determined): • For i from 1 to k: • Using D pulls, estimate mi(n). Pick D so that with probability 1-, estimate is within  of true mi(n). • For remaining n-kD pulls: • Pull arm with max. estimated mi(n) • Guarantee: S*(n) = m*(n) - o(1).

The GEV distribution • Z has a GEV distribution if for constants s, , and  > 0.  determines mean  determines standard deviation s determines shape

s>0 Lots of algebra s=0 Not so bad s<0 Behavior of the GEV

Empirical mi(1) Predicted mi(n) Empirical mi(2) Predicting mi(n) • Estimation procedure: linear interpolation! • Estimate mi(1) and mi(2) , then interpolate to get mi(n)

Predicting mi(n): Lemma • Let X be a random variable with (unknown) mean  and standard deviation  max.O(-2 log -1) samples of X suffice to obtain an estimate  such that with probability at least 1-, estimate is within of true value. • Proof idea: use “median of means”

Empirical mi(1) Predicted mi(n) Empirical mi(2) Predicting mi(n) • Equation for line: mi(n) = mi(1)+[mi(1)-mi(2)](log n) • Estimating mi(n) requires O((log n)2 -2 log -1) pulls

The Algorithm • Strategy S* ( and  to be determined): • For i from 1 to k: • Using D pulls, estimate mi(n). Pick D so that with probability 1-, estimate is within  of true mi(n). • For remaining n-kD pulls: • Pull arm with max. predicted mi(n) • Guarantee: S*(n) = m*(n) - o(1) • Three things make S* less than optimal: •  •  • m*(n) - m*(n-kD)

Analysis • Three things make S* less than optimal: •  •  • m*(n) - m*(n-kD) • Setting =n-2, =n-1/3 takes care of the first two. Then: • m*(n)-m*(n-kD) = O(log n - log(n-kD)) = O(kD/n) = O(k(log n)2 -2(log -1)/n) = O(k(log n)3 n-1/3) = o(1)

Summary & Future Work • Defined max k-armed bandit problem and discussed applications to heuristic search • Presented an asymptotically optimal algorithm for GEV payoff distributions (we analyzed special case s=0) • Working on applications to scheduling problems

The Extremal Types Theorem • Define Mn = max. of n draws, and suppose where each rn is a linear “rescaling function”. Then G is either a point mass or a “generalized extreme value distribution”: for constants s, , and  > 0.

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

Presentation Transcript

An Algorithm for the Steiner Problem in Graphs

Asymptotically Optimal Communication for Torus-Based Cryptography

An Optimal Algorithm for the Distinct Elements Problem

An Optimal Broadcast Algorithm for Content- Addressable Networks

An Exact Algorithm for the Boolean Connectivity Problem for k -CNF

A polylog competitive algorithm for the k-server problem

An Optimal Partial Decoding Algorithm for Rateless Codes

Asymptotically Optimal Mobile Ad-Hoc Routing

A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

An Improved Algorithm for the Rectangle Enclosure Problem

An Optimal Algorithm for the Distinct Elements Problem

Multi-armed Bandit Problems with Dependent Arms

Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning

Optimal algorithm for a special point-labeling problem

Asymptotically optimal K k -packings of dense graphs via fractional K k -decompositions

An Algorithm for the Steiner Problem in Graphs

Exploration and Exploitation Strategies for the K-armed Bandit Problem

An Optimal Algorithm for Online Square Detection

The K -armed Dueling Bandits Problem