1 / 80

Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington

F.L. Lewis Moncrief-O’Donnell Endowed Chair Head, Controls & Sensors Group. Supported by : NSF - PAUL WERBOS ARO – JIM OVERHOLT. Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington. ADP for Feedback Control. Talk available online at

salena
Download Presentation

Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. F.L. Lewis Moncrief-O’Donnell Endowed Chair Head, Controls & Sensors Group Supported by : NSF - PAUL WERBOS ARO – JIM OVERHOLT Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington ADP for Feedback Control Talk available online at http://ARRI.uta.edu/acs

  2. 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning David Fogel, General Chair Derong Liu, Program Chair Remi Munos, Program Co-Chair Jennie Si, Program Co-Chair Donald C. Wunsch, Program Co-Chair

  3. Automation & Robotics Research Institute (ARRI) Relevance- Machine Feedback Control High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression, disturbance rejection, friction compensation, deadzone/backlash control Industrial Machines Military Land Systems Vehicle Suspension Aerospace

  4. INTELLIGENT CONTROL TOOLS Fuzzy Associative Memory (FAM) Neural Network (NN) (Includes Adaptive Control) Fuzzy Logic Rule Base NN NN Output Input Input Membership Fns. Output Membership Fns. Input x Output u Input x Output u Both FAM and NN define a function u= f(x) from inputs to outputs • FAM and NN can both be used for: 1. Classification and Decision-Making • 2. Control NN Includes Adaptive Control (Adaptive control is a 1-layer NN)

  5. Neural Network Properties • Learning • Recall • Function approximation • Generalization • Classification • Association • Pattern recognition • Clustering • Robustness to single node failure • Repair and reconfiguration Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html

  6. 1 VT WT s(.) x1 s(.) y1 2 s(.) x2 s(.) y2 3 s(.) xn ym s(.) L outputs inputs s(.) hidden layer Two-layer feedforward static neural network (NN) Summation eqs Matrix eqs Have the universal approximation property Overcome Barron’s fundamental accuracy limitation of 1-layer NN

  7. qd .. Nonlinear Inner Loop Feedforward Loop ^ f(x) q e r  [I] Robot System Kv v(t) Robust Control Term PD Tracking Loop Neural Network Robot Controller Feedback linearization Universal Approximation Property qd Problem- Nonlinear in the NN weights so that standard proof techniques do not work Easy to implement with a few more lines of code Learning feature allows for on-line updates to NN memory as dynamics change Handles unmodelled dynamics, disturbances, actuator problems such as friction NN universal basis property means no regression matrix is needed Nonlinear controller allows faster & more precise motion

  8. Extension of Adaptive Control to nonlinear-in parameters systems No regression matrix needed Forward Prop term? Extra robustifying terms- Narendra’s e-mod extended to NLIP systems Backprop terms- Werbos Can also use simplified tuning- Hebbian But tracking error is larger

  9. More complex Systems? Force Control Flexible pointing systems Vehicle active suspension SBIR Contracts Won 1996 SBA Tibbets Award 4 US Patents NSF Tech Transfer to industry

  10. .. q d NN#1 q q q q = = . . r r ^ . e = F (x) r r q q 1 r r h u i e d K K K K h h h r q q . . d d q q = = q q d d d d ^ F (x) 2 NN#2 Backstepping Loop Flexible & Vibratory Systems Add an extra feedback loop Two NN needed Use passivity to show stability Backstepping .. .. q q d d Nonlinear FB Linearization Loop Nonlinear FB Linearization Loop NN#1 q q e e q q = = . . r r ^ ^ . . e e = = F F (x) (x) r r q q e e 1 1 r r h u i r r Robot Robot e d L L 1/K 1/K [ [ I] I] K K System System B1 B1 r r i i q q . . d d q q = = q q d d d d Robust Control Robust Control ^ ^ F F (x) (x) Term Term v v (t) (t) 2 2 i i NN#2 Backstepping Loop Tracking Loop Tracking Loop Neural network backstepping controller for Flexible-Joint robot arm Advantages over traditional Backstepping- no regression functions needed

  11. Actuator Nonlinearities - Actor: Critic: Deadzone, saturation, backlash NN in Feedforward Loop- Deadzone Compensation little critic network Acts like a 2-layer NN With enhanced backprop tuning !

  12. Tune NN observer - Tune Action NN - Needed when all states are not measured NN Observers i.e. Output feedback Recurrent NN Observer

  13. Also Use CMAC NN, Fuzzy Logic systems Fuzzy Logic System = NN with VECTOR thresholds Separable Gaussian activation functions for RBF NN Tune first layer weights, e.g. Centroids and spreads- Activation fns move around Dynamic Focusing of Awareness Separable triangular activation functions for CMAC NN

  14. Elastic Fuzzy Logic- c.f. P. Werbos Weights importance of factors in the rules Effect of change of membership function elasticities "c" Effect of change of membership function spread "a"

  15. Elastic Fuzzy Logic Control Control Tune Membership Functions Tune Control Rep. Values

  16. After tuning- Builds its own basis set- Dynamic Focusing of Awareness Better Performance Start with 5x5 uniform grid of MFS

  17. Optimality in Biological Systems Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so. Permeability control of the cell membrane http://www.accessexcellence.org/RC/VL/GG/index.html Cellular Metabolism

  18. R. Kalman 1960 Optimality in Control Systems Design Rocket Orbit Injection Dynamics Objectives Get to orbit in minimum time Use minimum fuel http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf

  19. 2. Neural Network Solution of Optimal Design Equations Nearly Optimal Control Based on HJ Optimal Design Equations Known system dynamics Preliminary Off-line tuning 1. Neural Networks for Feedback Control Based on FB Control Approach Unknown system dynamics On-line tuning Extended adaptive control to NLIP systems No regression matrix

  20. Performance output disturbance z d control Measured output y u H-Infinity Control Using Neural Networks Murad Abu Khalaf System where L2 Gain Problem Find control u(t) so that For all L2 disturbances And a prescribed gain g2 Zero-Sum differential Nash game

  21. Murad Abu Khalaf Successive Solution- Algorithm 1: Let g be prescribed and fixed. a stabilizing control with region of asymptotic stability 1. Outer loop- update control Initial disturbance 2. Inner loop- update disturbance Solve Value Equation Inner loop update disturbance go to 2. Iterate i until convergence to with RAS Outer loop update control action Go to 1. Iterate j until convergence to , with RAS Cannot solve HJI !! Consistency equation For Value Function CT Policy Iteration for H-Infinity Control

  22. Murad Abu Khalaf Problem- Cannot solve the Value Equation! Neural Network Approximation for Computational Technique Neural Network to approximate V(i)(x) (Can use 2-layer NN!) Value function gradient approximation is Substitute into Value Equation to get Therefore, one may solve for NN weights at iteration (i,j) VFA converts partial differential equation into algebraic equation in terms of NN weights

  23. Murad Abu Khalaf Neural Network Optimal Feedback Controller Optimal Solution A NN feedback controller with nearly optimal weights

  24. Finite Horizon Control Cheng Tao Fixed-Final-Time HJB Optimal Control Optimal cost Optimal control This yields the time-varyingHamilton-Jacobi-Bellman (HJB) equation

  25. Cheng Tao Approximating in the HJB equation gives an ODE in the NN weights Solve by least-squares – simply integrate backwards to find NN weights Control is HJB Solution by NN Value Function Approximation Time-varying weights Irwin Sandberg Note that where is the Jacobian Policy iteration not needed!

  26. ARRI Research Roadmap in Neural Networks 3. Approximate Dynamic Programming – 2006- Nearly Optimal Control Based on recursive equation for the optimal value Usually Known system dynamics (except Q learning) The Goal – unknown dynamics On-line tuning Optimal Adaptive Control Extend adaptive control to yield OPTIMAL controllers. No canonical form needed. 2. Neural Network Solution of Optimal Design Equations – 2002-2006 Nearly optimal solution of controls design equations. No canonical form needed. Nearly Optimal Control Based on HJ Optimal Design Equations Known system dynamics Preliminary Off-line tuning 1. Neural Networks for Feedback Control – 1995-2002 Extended adaptive control to NLIP systems No regression matrix Based on FB Control Approach Unknown system dynamics On-line tuning NN- FB lin., sing. pert., backstepping, force control, dynamic inversion, etc.

  27. Four ADP Methods proposed by Werbos Critic NN to approximate: AD Heuristic dynamic programming Heuristic dynamic programming (Watkins Q Learning) Value Q function Dual heuristic programming AD Dual heuristic programming Gradient Gradients Action NN to approximate the Control Bertsekas- Neurodynamic Programming Barto & Bradtke- Q-learning proof (Imposed a settling time)

  28. = the prescribed control input function Discrete-Time Optimal Control cost Value function recursion Hamiltonian Optimal cost Bellman’s Principle Optimal Control System dynamics does not appear Solutions by Comp. Intelligence Community

  29. Use System Dynamics System DT HJB equation Difficult to solve Few practical solutions by Control Systems Community

  30. Greedy Value Fn. Update- Approximate Dynamic Programming ADP Method 1 - Heuristic Dynamic Programming (HDP) Lyapunov eq. ADP Greedy Cost Update Simple recursion For LQR Underlying RE Lancaster & Rodman proved convergence Paul Werbos Policy Iteration For LQR Underlying RE Hewer 1971 Initial stabilizing control is needed Initial stabilizing control is NOT needed

  31. DT HDP vs. Receding Horizon Optimal Control Forward-in-time HDP Backward-in-time optimization – RHC Control Lyapunov Function

  32. Q Learning - Action Dependent ADP Define Q function uk arbitrary policy h(.) used after time k Note Recursion for Q Simple expression of Bellman’s principle

  33. Q Learning does not need to know f(xk) or g(xk) For LQR V is quadratic in x Q is quadratic in x and u Control update is found by so Control found only from Q function A and B not needed

  34. Greedy Q Fn. Update - Approximate Dynamic Programming ADP Method 3. Q Learning Action-Dependent Heuristic Dynamic Programming (ADHDP) Greedy Q Update Model-free ADP Paul Werbos Update weights by RLS or backprop. Model-free policy iteration Q Policy Iteration Bradtke, Ydstie, Barto Control policy update Stable initial control needed

  35. Q learning actually solves the Riccati Equation WITHOUT knowing the plant dynamics Model-free ADP Direct OPTIMAL ADAPTIVE CONTROL Works for Nonlinear Systems Proofs? Robustness? Comparison with adaptive control methods?

  36. Asma Al-Tamimi ADP for Discrete-Time H-infinity Control Finding Nash Game Equilbrium • HDP • DHP • AD HDP – Q learning • AD DHP

  37. Asma Al-Tamimi ADP for DT H∞ Optimal Control Systems Disturbance Penalty output wk zk Control uk yk Measured output uk=Lxk where Find control uk so that for all L2 disturbances and a prescribed gain g2 when the system is at rest, x0=0.

  38. Asma Al-Tamimi Two known ways for Discrete-time H-infinity iterative solution Policy iteration for game solution Requires stable initial policy ADP Greedy iteration Does not require a stable initial policy Both require full knowledge of system dynamics

  39. DT GameHeuristic Dynamic Programming: Forward-in-time Formulation An Approximate Dynamic Programming Scheme (ADP) where one has the following incremental optimization which is equivalently written as Asma Al-Tamimi

  40. Showed that this is equivalent to iteration on the Underlying Game Riccati equation Which is known to converge- Stoorvogel, Basar Asma Al-Tamimi HDP- Linear System Case Value function update Solve by batch LS or RLS Control update Control gain A, B, E needed  Disturbance gain

  41. Q-Learning for DT H-infinity Control:Action Dependent Heuristic Dynamic Programming Asma Al-Tamimi • Dynamic Programming: Backward-in-time • Adaptive Dynamic Programming: Forward-in-time

  42. Linear Quadratic case- V and Q are quadratic Asma Al-Tamimi Q learning for H-infinity Control Q function update Control Action and Disturbance updates A, B, E NOT needed 

  43. Probing Noise injected to get Persistence of Excitation Proof- Still converges to exact result Asma Al-Tamimi Quadratic Basis set is used to allow on-line solution and where Quadratic Kronecker basis Q function update Solve for ‘NN weights’ - the elements of kernel matrix H Use batch LS or online RLS Control and Disturbance Updates

  44. H-inf Q learning Convergence Proofs Asma Al-Tamimi • Convergence – H-inf Q learning is equivalent to solving without knowing the system matrices • The result is a model free Direct Adaptive Controller that converges to an H-infinity optimal controller • No requirement what so ever on the model plant matrices Direct H-infinity Adaptive Control

  45. Asma Al-Tamimi

  46. Compare to Q function for H2 Optimal Control Case H-infinity Game Q function

  47. Asma Al-Tamimi ADP for Nonlinear Systems: Convergence Proof • HDP

  48. Asma Al-Tamimi System dynamics Value function recursion HDP

  49. Flavor of proofs Proof of convergence of DT nonlinear HDP Asma Al-Tamimi

More Related