160 likes | 510 Views
Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen. BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim. Introduction. Optimising a sequence of actions to attain some future goal is the general topic of control theory.
E N D
Ch 17. Optimal control theory and the linear Bellman equationHJ Kappen BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim
Introduction • Optimising a sequence of actions to attain some future goal is the general topic of control theory. • In an example of a human throwing a spear to kill an animal, a sequence of actions can be assigned a cost consists of two terms. • The first is a path cost that specifies the energy consumption to contract the muscles. • The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it. • The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort.
Discrete Time Control (1/3) where xtis an n-dimensional vector describing the state of the system and utis an m-dimensional vector that specifies the control or action at time t. • A cost function that assigns a cost to each sequence of controls where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT)is the cost associated with ending up in state xTat time T.
Discrete Time Control (3/3) • The problem of optimal control is to find the sequence u0:T-1that minimisesC(x0, u0:T-1). • The optimal cost-to-go
Discrete Time Control (1/3) • The algorithm to compute the optimal control, trajectory, and the cost is given by • 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for x compute • 3. Forwards: For t=0,…,T-1 compute
The HJB Equation (1/2) • (Hamilton-Jacobi-Belman equation) • The optimal control at the current x, t is given by • Boundary condition is
The HJB Equation (2/2) Optimal control of mass on a spring
Stochastic Differential Equations (1/2) • Consider the random walk on the line with x0=0. • In a closed form, . • In the continuous time limit we define • The conditional probability distribution (Wiener Process)
Stochastic Optimal Control Theory (2/2) • dξ is a Wiener process with . • Since <dx2> is of order dt, we must make a Taylor expansion up to order dx2. Stochastic Hamilton-Jacobi-Bellman equation : drift : diffusion
Path Integral Control (1/2) • In the problem of linear control and quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transformation of the cost-to-go. HJB becomes
Path Integral Control (2/2) • Let describe a diffusion process for defined Fokker-Planck equation (1)
The Diffusion Process as a Path Integral (1/2) • Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ. • Sampling process and Monte Carlo With probability 1-V(x,t)dt/λ, with probability V(x,t)/λ, in this case, path is killed.
The Diffusion Process as a Path Integral (2/2) where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.
Discussion • One can extend the path integral control of formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other to maximise a common reward function. • The path integral method has great potential for application in robotics.