1 / 15

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking. Authors: Vikram Krishnamurthy & Robin Evans. Presented by Shihao Ji Duke University Machine Learning Group June 10, 2005. Outline. Motivation Overview Multiarmed Bandits HMM Multiarmed Bandits

anitra
Download Presentation

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Model Multiarm Bandits:A Methodology for Beam Scheduling in Multitarget Tracking Authors: Vikram Krishnamurthy & Robin Evans Presented by Shihao Ji Duke University Machine Learning Group June 10, 2005

  2. Outline • Motivation • Overview • Multiarmed Bandits • HMM Multiarmed Bandits • Experimental Results

  3. Motivation • ESA has only one steerable beam. • The coordinates of each target evolve according to a finite state Markov chain. • Question: which single target should the tracker choose to observe at each time instant in order to optimize some specified cost function?

  4. Overview - How it works?

  5. Multiarmed Bandits • The Model One has N parallel projects, indexed i=1,2,…,N and at each instant of discrete time can work on only a single project. Let the state of project i at time k be denoted . If one works on project i at time k then one pays an immediate expected cost of . The state changes to by a Markov transition rule (which may depend upon i, but not upon t), while the state of the projects one has not touched remain unchanged: for .The problem is how to allocate one’s effort over projects sequentially in time so as to minimize expected total discounted cost.

  6. Gittins Index • Simplest non-trivial problem, classic • No essential solution until Gittins and his co-workers. • They proved that to each project i one could attach an index, ,such that the optimal action at time k is to work on that project for which the current index is smallest. The index is calculated by solving the problem of allocating one’s effort optimally between project i and a standard project which yields a constant cost. • Gittins’ result thus reduces the case of general N to that of the case N = 2.

  7. HMM Multiarmed Bandits • The “standard” multiarmed bandits problem involves a fully observed finite state Markov chain and is only a MDP with a rich structure. • For the multitarget tracking, due to measurement noise at the sensor, the states are only partially observable. Thus, the multitarget tracking problem needs to be formulated as a multiarmed bandits involving HMMs (with the HMM filter to estimate the information state). • Can be solved brute forcedly by POMDP, but it involves a much higher (enormous) dimensional Markov chain. • Bandit assumption decouples the problem.

  8. Bandit Assumption • The information state of currently observed target updates by the HMM filter: • For the other P-1 unobserved target, their information states are kept frozen: if target q is not observed

  9. Why it is Valid? • Slow Dynamics: slowly moving targets have a bandit structure. where • Decoupling Approximation: without the bandit assumption, the optimal solution is intractable. Bandit model is perhaps the only reasonable approximation that leads to computationally tractable solution. • Reinitialization: a compromise. Reinitialize the HMM multiarmed bandits at regular intervals with updated estimates from all targets.

  10. Some details • Finite State Markov Assumption: denotes the quantized distance of the pth target from base station, and the target distance evolves according to a finite-state Markov chain. • Cost structure: typically depends on the distance of the pth target to the base station, i.e., the target gets close to the base station pose a greater threat and given higher priority by the tracking algorithm. • Objective function:

  11. Optimal Solution • For the bandit assumption, the optimal solution has an indexable (decoupling) rule, that is, the optimization can be decoupled into P independent optimization problems. • For each target p, there is a function (Gittins index) . Solved by POMDP algorithms, see the next slide. • The optimal scheduling policy at time k is to steer the beam toward the target with the smallest Gittins index

  12. Gittins Index • For arbitrary multiarmed bandits problem, the Gittins index can be calculated by solving an associated infinite horizon discounted control problem called the “return to state”. • For the target p, given information state at time k, there are two actions: 1) Continue, which incurs a cost and evolves according to HMM filter; 2) Restart, which moves to a fixed information state , incurs a cost , and evolves according to HMM filter.

  13. The Gittins index of the state of target p is given by where satisfies the Bellman equation:

  14. POMDP solver • Defining new parameters (see eq.15), • Can be solved by any standard POMDP solver: such as sondik’s algorithm, witness algorithm, incremental-prune, or suboptimal (approximated) algorithms.

  15. Experimental Results

More Related