150 likes | 265 Views
A Stochastic Pursuit-Evasion Game with no Information Sharing. Ashitosh Swarup Jason Speyer Johnathan Wolfe School of Engineering and Applied Science UCLA. Introduction. The game considered here is the LQG stochastic pursuit-evasion game.
E N D
A Stochastic Pursuit-Evasion Game with no Information Sharing Ashitosh Swarup Jason Speyer Johnathan Wolfe School of Engineering and Applied Science UCLA
Introduction • The game considered here is the LQG stochastic pursuit-evasion game. • Deterministic version of this game was studied by Ho, Bryson and Baron. • The case in which both players process their own noisy measurements was studied by Willman. • We continue investigating this class of games.
Willman’s Approach • Attempted to find strategies in which each player’s control is an assumed linear function of his entire observation history. • Optimizing the cost function resulted in a set of implicit equations for the control gains. • No closed form solution shown for implicit equations; results were obtained numerically for up to 3 stages.
Our Objective • Examine conditions under which closed form linear and/or nonlinear optimal solutions exist. • Willman sets up an LQG problem and states an optimality result without proof. We use dynamic programming to derive conditions for optimal controllers. • If possible, eliminate the need to smooth over each player’s entire observation sequence (dimensionality constraint).
Problem Setup • System Dynamics given by: x(i+1)=x(i)+Gpu(i)-Gev(i)+q(i) • Subscripts p and e refer to pursuer and evader respectively. • The pursuer’s and opponent’s controls are u and v respectively. • q is Gaussian white, (0,Q), x(0) is Gaussian, (x0,P0), statistics of q and x(0) a priori known to both players.
Problem Setup (contd.) The players receive noisy measurements: zp(i)=Hpx(i)+wp(i) ze(i)=Hex(i)+we(i) • Each player has no information about his opponent’s observation, but knows his opponent’s noise statistics. • wp Gaussian white, (0,Rp). • we Gaussian white, (0,Re). • Both players start off with common a priori estimate of the initial state x(0).
Problem Setup (contd.) • Observation Histories: Zp(i)=f zp(j), j=0,..,i g Ze(i)=f ze(j), j=0,..,i g • Cost function: J(u,v)=E [[Sfx(n),x(n)]+0n-1([Bu(i),u(i)]-[Cv(i),v(i)])] • Pursuer minimizes the cost function while evader maximizes.
Saddle Point Condition • Finding optimal controls involves solving the following saddle-point inequality: J(u,vo) ¸ J(uo,vo) ¸ J(uo,v) • Optimize person-by-person by solving the following inequalities: J(uo,vo) ¸ J(uo,v) J(u,vo) ¸ J(uo,vo)
The One-Stage Game • Cost function: J(u,v)=E [[Sfx(1),x(1)]+[Bu(0),u(0)]-[Cv(0),v(0)]] • Optimize to get expressions for uo(0) and vo(0). • Assume a linear functional form of the controls: uo(0)=u+ux0+uzp(0) vo(0)=v+vx0+vze(0) • Solving for the coefficients using the equations derived previously gives u=v=0, and nonzero values for the other matrix gains. • An assumed nonlinear form of the optimal controls degenerates into the above linear controllers.
The Two Stage Game • The cost function in this case is J1(u,v)=E[[Sfx(2),x(2)]+01[Biu(i),u(i)]-[Civ(i),v(i)]] • Assume a linear form of the controls: uo(0)=k0+k00x0+k00zp(0); vo(0)=l0+l00x0+l00ze(0) uo(1)=k1+k01x0+k10zp(0)+k11zp(1) vo(1)=l1+l01x0+l10ze(0)+l11ze(1) • Optimize cost function using dynamic programming to get expressions for uo(0), vo(0), uo(1) and vo(1). • Use the expressions derived for the optimal controls to get 14 equations for the 14 unknown control-coefficient matrices.
The Two Stage ProblemAnalytical Constraint • Solving the equations for the control gains involves inverting a matrix with unknown elements. • Results in polynomial equations in the unknowns. • Consider the scalar case first to extract properties of the system.
The Two Stage GameProperties of the Scalar Equations • k00, l00, k01, l01, k11 and l11 are mutually dependent and do not depend on the other variables. • This reduces the number of equations we have to solve simultaneously from 14 to 6. • The other variables k0, l0, k00, l00, k1, l1, k01 and l01 depend on the above 6 variables, and can be solved for after solving the above 6 equations.
The Two Stage GameSolving the Scalar Equations • k00 and l00 can be eliminated by solving: k00=p(kp1+kp2l00) l00=e(ke1+ke2k00) • p, e, kp1, kp1, ke1, ke2 and le2 are functions of k01, l01, k11 and l11. • We thus need to solve 4 equations for the 4 variables from the final stage.
The Two Stage GameSolving the Scalar Equations (contd.) • As we go on to the final stage, we encounter polynomial equations of the form: k01=fp(l01, k11, l11) l01=fe(k01, l11, k11) • Eliminate k01 and l01 from these equations and go on to solve the pair of equations for k11 and l11. • Back-substitute values of k11 and l11 into previous equations to solve for remaining 4 variables. • We thus have a dynamic programming kind of approach for these 6 variables i.e. solve for variables from the final stage first and then solve for subsequent stages.
Conclusion and Future Work • Even seemingly simple linear structures result in complex polynomial equations. • If analytical linear solutions exist in the scalar case, do nonlinear solutions exist? • Is it possible to find analytical closed form solutions for the vector case? • Can the need to smooth over the entire observation sequence be eliminated?