Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Hidden Markov Model Multiarm Bandits:A Methodology for Beam Scheduling in Multitarget Tracking Authors: Vikram Krishnamurthy & Robin Evans Presented by Shihao Ji Duke University Machine Learning Group June 10, 2005

Outline • Motivation • Overview • Multiarmed Bandits • HMM Multiarmed Bandits • Experimental Results

Motivation • ESA has only one steerable beam. • The coordinates of each target evolve according to a finite state Markov chain. • Question: which single target should the tracker choose to observe at each time instant in order to optimize some specified cost function?

Overview - How it works?

Multiarmed Bandits • The Model One has N parallel projects, indexed i=1,2,…,N and at each instant of discrete time can work on only a single project. Let the state of project i at time k be denoted . If one works on project i at time k then one pays an immediate expected cost of . The state changes to by a Markov transition rule (which may depend upon i, but not upon t), while the state of the projects one has not touched remain unchanged: for .The problem is how to allocate one’s effort over projects sequentially in time so as to minimize expected total discounted cost.

Gittins Index • Simplest non-trivial problem, classic • No essential solution until Gittins and his co-workers. • They proved that to each project i one could attach an index, ,such that the optimal action at time k is to work on that project for which the current index is smallest. The index is calculated by solving the problem of allocating one’s effort optimally between project i and a standard project which yields a constant cost. • Gittins’ result thus reduces the case of general N to that of the case N = 2.

HMM Multiarmed Bandits • The “standard” multiarmed bandits problem involves a fully observed finite state Markov chain and is only a MDP with a rich structure. • For the multitarget tracking, due to measurement noise at the sensor, the states are only partially observable. Thus, the multitarget tracking problem needs to be formulated as a multiarmed bandits involving HMMs (with the HMM filter to estimate the information state). • Can be solved brute forcedly by POMDP, but it involves a much higher (enormous) dimensional Markov chain. • Bandit assumption decouples the problem.

Bandit Assumption • The information state of currently observed target updates by the HMM filter: • For the other P-1 unobserved target, their information states are kept frozen: if target q is not observed

Why it is Valid? • Slow Dynamics: slowly moving targets have a bandit structure. where • Decoupling Approximation: without the bandit assumption, the optimal solution is intractable. Bandit model is perhaps the only reasonable approximation that leads to computationally tractable solution. • Reinitialization: a compromise. Reinitialize the HMM multiarmed bandits at regular intervals with updated estimates from all targets.

Some details • Finite State Markov Assumption: denotes the quantized distance of the pth target from base station, and the target distance evolves according to a finite-state Markov chain. • Cost structure: typically depends on the distance of the pth target to the base station, i.e., the target gets close to the base station pose a greater threat and given higher priority by the tracking algorithm. • Objective function:

Optimal Solution • For the bandit assumption, the optimal solution has an indexable (decoupling) rule, that is, the optimization can be decoupled into P independent optimization problems. • For each target p, there is a function (Gittins index) . Solved by POMDP algorithms, see the next slide. • The optimal scheduling policy at time k is to steer the beam toward the target with the smallest Gittins index

Gittins Index • For arbitrary multiarmed bandits problem, the Gittins index can be calculated by solving an associated infinite horizon discounted control problem called the “return to state”. • For the target p, given information state at time k, there are two actions: 1) Continue, which incurs a cost and evolves according to HMM filter; 2) Restart, which moves to a fixed information state , incurs a cost , and evolves according to HMM filter.

The Gittins index of the state of target p is given by where satisfies the Bellman equation:

POMDP solver • Defining new parameters (see eq.15), • Can be solved by any standard POMDP solver: such as sondik’s algorithm, witness algorithm, incremental-prune, or suboptimal (approximated) algorithms.

Experimental Results

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Presentation Transcript

Airline Scheduling

C280, Computer Vision

Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity

Planning and Scheduling in an RCM Environment

Hidden Markov models in Computational Biology

Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR Data

Hidden Markov Models for Speech Recognition

Chapter 6: CPU Scheduling

Processor Scheduling

HISTORY OF SCHEDULING

CS 6243 Machine Learning

Scheduling

Scheduling for Grid Computing

Uncovering Sequences Mysteries With Hidden Markov Model

Pedestrians Detection and Tracking

Deterministic Scheduling

Tracking Awareness Of Venous Thromboembolism Among The General Population

Speech Recognition

Target Tracking