1 / 15

A Stochastic Pursuit-Evasion Game with no Information Sharing

A Stochastic Pursuit-Evasion Game with no Information Sharing. Ashitosh Swarup Jason Speyer Johnathan Wolfe School of Engineering and Applied Science UCLA. Introduction. The game considered here is the LQG stochastic pursuit-evasion game.

keaira
Download Presentation

A Stochastic Pursuit-Evasion Game with no Information Sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Stochastic Pursuit-Evasion Game with no Information Sharing Ashitosh Swarup Jason Speyer Johnathan Wolfe School of Engineering and Applied Science UCLA

  2. Introduction • The game considered here is the LQG stochastic pursuit-evasion game. • Deterministic version of this game was studied by Ho, Bryson and Baron. • The case in which both players process their own noisy measurements was studied by Willman. • We continue investigating this class of games.

  3. Willman’s Approach • Attempted to find strategies in which each player’s control is an assumed linear function of his entire observation history. • Optimizing the cost function resulted in a set of implicit equations for the control gains. • No closed form solution shown for implicit equations; results were obtained numerically for up to 3 stages.

  4. Our Objective • Examine conditions under which closed form linear and/or nonlinear optimal solutions exist. • Willman sets up an LQG problem and states an optimality result without proof. We use dynamic programming to derive conditions for optimal controllers. • If possible, eliminate the need to smooth over each player’s entire observation sequence (dimensionality constraint).

  5. Problem Setup • System Dynamics given by: x(i+1)=x(i)+Gpu(i)-Gev(i)+q(i) • Subscripts p and e refer to pursuer and evader respectively. • The pursuer’s and opponent’s controls are u and v respectively. • q is Gaussian white, (0,Q), x(0) is Gaussian, (x0,P0), statistics of q and x(0) a priori known to both players.

  6. Problem Setup (contd.) The players receive noisy measurements: zp(i)=Hpx(i)+wp(i) ze(i)=Hex(i)+we(i) • Each player has no information about his opponent’s observation, but knows his opponent’s noise statistics. • wp Gaussian white, (0,Rp). • we Gaussian white, (0,Re). • Both players start off with common a priori estimate of the initial state x(0).

  7. Problem Setup (contd.) • Observation Histories: Zp(i)=f zp(j), j=0,..,i g Ze(i)=f ze(j), j=0,..,i g • Cost function: J(u,v)=E [[Sfx(n),x(n)]+0n-1([Bu(i),u(i)]-[Cv(i),v(i)])] • Pursuer minimizes the cost function while evader maximizes.

  8. Saddle Point Condition • Finding optimal controls involves solving the following saddle-point inequality: J(u,vo) ¸ J(uo,vo) ¸ J(uo,v) • Optimize person-by-person by solving the following inequalities: J(uo,vo) ¸ J(uo,v) J(u,vo) ¸ J(uo,vo)

  9. The One-Stage Game • Cost function: J(u,v)=E [[Sfx(1),x(1)]+[Bu(0),u(0)]-[Cv(0),v(0)]] • Optimize to get expressions for uo(0) and vo(0). • Assume a linear functional form of the controls: uo(0)=u+ux0+uzp(0) vo(0)=v+vx0+vze(0) • Solving for the coefficients using the equations derived previously gives u=v=0, and nonzero values for the other matrix gains. • An assumed nonlinear form of the optimal controls degenerates into the above linear controllers.

  10. The Two Stage Game • The cost function in this case is J1(u,v)=E[[Sfx(2),x(2)]+01[Biu(i),u(i)]-[Civ(i),v(i)]] • Assume a linear form of the controls: uo(0)=k0+k00x0+k00zp(0); vo(0)=l0+l00x0+l00ze(0) uo(1)=k1+k01x0+k10zp(0)+k11zp(1) vo(1)=l1+l01x0+l10ze(0)+l11ze(1) • Optimize cost function using dynamic programming to get expressions for uo(0), vo(0), uo(1) and vo(1). • Use the expressions derived for the optimal controls to get 14 equations for the 14 unknown control-coefficient matrices.

  11. The Two Stage ProblemAnalytical Constraint • Solving the equations for the control gains involves inverting a matrix with unknown elements. • Results in polynomial equations in the unknowns. • Consider the scalar case first to extract properties of the system.

  12. The Two Stage GameProperties of the Scalar Equations • k00, l00, k01, l01, k11 and l11 are mutually dependent and do not depend on the other variables. • This reduces the number of equations we have to solve simultaneously from 14 to 6. • The other variables k0, l0, k00, l00, k1, l1, k01 and l01 depend on the above 6 variables, and can be solved for after solving the above 6 equations.

  13. The Two Stage GameSolving the Scalar Equations • k00 and l00 can be eliminated by solving: k00=p(kp1+kp2l00) l00=e(ke1+ke2k00) • p, e, kp1, kp1, ke1, ke2 and le2 are functions of k01, l01, k11 and l11. • We thus need to solve 4 equations for the 4 variables from the final stage.

  14. The Two Stage GameSolving the Scalar Equations (contd.) • As we go on to the final stage, we encounter polynomial equations of the form: k01=fp(l01, k11, l11) l01=fe(k01, l11, k11) • Eliminate k01 and l01 from these equations and go on to solve the pair of equations for k11 and l11. • Back-substitute values of k11 and l11 into previous equations to solve for remaining 4 variables. • We thus have a dynamic programming kind of approach for these 6 variables i.e. solve for variables from the final stage first and then solve for subsequent stages.

  15. Conclusion and Future Work • Even seemingly simple linear structures result in complex polynomial equations. • If analytical linear solutions exist in the scalar case, do nonlinear solutions exist? • Is it possible to find analytical closed form solutions for the vector case? • Can the need to smooth over the entire observation sequence be eliminated?

More Related