Sequential Off-line Learning with Knowledge Gradients

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering Princeton University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAA

Overview • Problem Formulation • Knowledge Gradient Policy • Theoretical Results • Numerical Results

Measurement Phase Experimental outcomes Alternative 1 +.2 +1 Alternative 2 -1 Alternative 3 -.2 N opportunities to do experiments Alternative M -1 +1

Implementation Phase Alternative 1 Alternative 2 Alternative 3 Reward Alternative M

Taxonomy of Sampling Problems Sampling

Example 1 • One common experimental design is to spread measurements equally across the alternatives. Quality Alternative

Round Robin Exploration Example 1

Example 2 • How might we improve round-robin exploration for use with this prior?

Largest Variance Exploration Example 2

Example 3 Exploitation:

Model • xn is the alternative tested at time n. • Measure the value of alternative xn • At time n, , independent • Error n+1 is independent N(0,()2). • At time N, choose an alternative. • Maximize

State Transition • At time n measure alternative xn. We update our estimate of based on the measurement • Estimates of other Yx do not change.

n change in best estimate of Yx due to the measurement • The value of the optimal policy satisfies Bellman’s equation • At time n, n+1x is a normal random variable with mean nx and variance satisfying uncertainty about Yx after the measurement uncertainty about Yx before the measurement

Utility of Information Consider our “utility of information”, and consider the random change in utility due to a measurement at time n

Knowledge Gradient Definition • The knowledge gradient policy chooses the measurement that maximizes this expected increase in utility,

Knowledge Gradient • We may compute the knowledge gradient policy via which is the expectation of the maximum of a normal and a constant.

Knowledge Gradient The computation becomes where  is the normal cdf,  is the normal pdf, and

Optimality Results • If our measurement budget allows only one measurement (N=1), the knowledge gradient policy is optimal.

Optimality Results • The knowledge gradient policy is optimal in the limit as the measurement budget N grows to infinity. • This is really a convergence result.

Optimality Results • The knowledge gradient policy has sub-optimality bounded by where VKG,n gives the value of the knowledge gradient policy and Vn the value of the optimal policy.

Optimality Results • If there are exactly 2 alternatives (M=2), the knowledge gradient policy is optimal. • In this case, the optimal policy reduces to

Optimality Results • If there is no measurement noise and alternatives may be reordered so that then the knowledge gradient policy is optimal.

Numerical Experiments • 100 randomly generated problems • M Uniform {1,...100} • N Uniform {M, 3M, 10M} • 0x Uniform [-1,1] • (0x)2 = 1 with probability 0.9 = 10-3 with probability 0.1 •  = 1

Numerical Experiments

Interval Estimation • Compare alternatives via a linear combination of mean and standard deviation. • The parameter z/2 controls the tradeoff between exploration and exploitation.

KG / IE Comparison

KG / IE Comparison Value of KG – Value of IE

IE and “Sticking” Alternative 1 is known perfectly

IE and “Sticking”

Thank You Any Questions?

Numerical Example 1

Knowledge Gradient Example

Interval Estimation Example

Boltzmann Exploration Parameterized by a declining sequence of temperatures (T0,...TN-1).

Sequential Off-line Learning with Knowledge Gradients

Sequential Off-line Learning with Knowledge Gradients

Presentation Transcript

Sequential Learning

Sequential Learning

Working with vocabulary: On and off line

“Off-Line” Purchases

Off-line Karma

Gradients

Gradients

off-line

GRADIENTS

ALICE Off-line Computing

LHCb on-line / off-line computing

Stacked Sequential Learning

Knowledge Management by On-line Learning Communities

COMPASS off-line computing

Gradients

Sequential Learning with Dependency Nets

Gradients

Off-Line Journals

PHOS off-line status

Stacked Sequential Learning