320 likes | 432 Views
Autonomous Motion Learning for Near Optimal Control. By Alan Jennings School of Engineering, University of Dayton Dayton , OH, August 2012. Dissertation defense in partial fulfillment of the requirements for the degree of doctor of philosophy in electrical engineering. Motivation.
E N D
Autonomous Motion Learning for Near Optimal Control By Alan Jennings School of Engineering, University of Dayton Dayton, OH, August 2012 Dissertation defense in partial fulfillment of the requirements for the degree of doctor of philosophy in electrical engineering
Motivation Consider human learning: Intelligent system: able to solve new problems and become an expert Consider computer accomplishments: Beat chess & Jeopardy! grandmasters But one cannot be repurposed for the other. Consider general purpose learning: People can grow up to be presidents, design fashion, play croquet, identify liars, train animals, predict weather… Not been accomplished The foundation for general purpose learning is a developmental framework: Shaped by environment & experiences Complex value systems guide learning Infant stage restrict exploration until basic skills are established IBM’s Deep blue beat Kasparov on their second match in 1997 IBM’s Watson beat Jennings and Rutter in 2011 In 2011, Google gained Nevada licenses for self-driving cars Alan Jennings, Dissertation Defense, July 2012 Google Priusimage, Flckr user Steve Jurvetson
Context in Developmental Learning • Developmental learning seeks to mimic the progressive learning process • Infant -> Toddler -> Child -> Young adult -> … • The solution/knowledge should be unguided by the programmer • Learning basic tasks supports learning high-level tasks • Proverbial walking before running • The robot then learns general tasks of increasing complexity at increasing proficiency • Does not require reasoning/understanding/consciousness Alan Jennings, Dissertation Defense, July 2012
My ContributionsAutonomous motion learning: • General purpose rigid body motion optimization • Provides novel high-level interface at the robot geometry level • Allows for novice roboticists or computers to design motions • However, has high computation requirements • Optimal inverse functions from a global search • Organizes motions in continuous, optimal inverse functions • Provides a set of reflexive responses for use online • Efficiently searches high dimension space using agents & local gradient • Improving motions by unbounded resolution • Nodes are added to an interpolation approaching optimal continuous function in the limit • Efficiently collects and “understands” experiences • Motions are not limited by initial programming resolution or initial training time limitation Alan Jennings, Dissertation Defense, July 2012
Motivating Example Use of general purpose programs to solve control problems • Use CAD package to draw robot • Use kinematic program for equations of motion • Use optimal control program to solve • Optimal control problem is introduced • Finding the input with the lowest cost among inputs satisfying constraints. Alan Jennings, Dissertation Defense, July 2012
Motivating Example Use of general purpose programs for solving control problems The optimal control problem Finding the control input with the lowest cost among inputs satisfying constraints. Human creativity comes in at the design level, not the optimization. Alan Jennings, Dissertation Defense, July 2012 What does it look like What are the controls What is trying to be done Draft project Set up Simulink Set up DIDO Optimal Control Dynamics Mass & joints
General Optimal Control Problem Motivating Example • Typically solved by discretizing over time • Optimize a set of variables, not the continuous function • Local search method • Applies to isolated problem • Change final value and needs new optimization Use of general purpose programs for solving control problems The optimal control problem Finding the control input with the lowest cost among inputs satisfying constraints. Alan Jennings, Dissertation Defense, July 2012 x(t), u(t) → g(t) Xo xo ϕ ψo Xf ψf xf J
Motivating ExampleMotion Primitive Example Problem: The way forward If system dynamics and initial state are repeatable, Then problem is really only to find a control signal. Continuous signals can be approximated by parameterization, So motion primitives can be composed solely by a vector function of an output. System: Pendulum actuated at base Cost: (Torque)2, J=∫ u(τ)2dτ Output: Initial Disturbance, y = θ(t0) Constraints: Reach final value: θ (tf) =0 Saturation: -umax ≤ u(t) ≤ umax Alan Jennings, Dissertation Defense, July 2012
My ContributionsAutonomous motion learning: • General purpose rigid body motion optimization • Provides novel high-level interface at the robot geometry level • Allows for novice roboticists or computers to design motions • However, has high computation requirements • Optimal inverse functions from a global search • Organizes motions in continuous, optimal inverse functions • Provides a set of reflexive responses for use online • Efficiently searches high dimension space using agents & local gradient • Improving motions by unbounded resolution • Nodes are added to an interpolation approaching optimal continuous function in the limit • Efficiently collects and “understands” experiences • Motions are not limited by initial programming resolution or initial training time limitation Alan Jennings, Dissertation Defense, July 2012
Diversity and Progression in Motion Primitives • Continuous, optimal inverse function • Motion primitives should be continuous so that changes in the system behavior are not abrupt • Global search required for discovery • Global search offers possibility of finding alternative motion primitives • Finding isolated optima requires testing candidates which local conditions indicate would give worse performance • Progression via increasing resolution • After optimizing at a given resolution, the signal is then limited by the optimal signal not lying in the space of the parameterization. So the resolution must be increased to improve performance. Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsHigh level concept Optimization Execution Initialize Population Move Agents: Lower J(x),Maintain f(x) Select inverse function, hk(yd) Operator Set of hk(yd)’s Get yd, Evaluate hk Move to new x* Check for removal or settling conditions Form Cluster • Population covers broad area and uses local gradients to improve. • Converging agents are removed so number of agents quickly drops. • Settled agents create a motion primitive and use the local gradient to expand to new outputs. • The operator has a choice of inverse functions to select from. • Can use softer criteria for preference. • Inverse function is continuous and easily calculated making them suited for real-time use. Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsMechanics of the method Output Cost Step Improving a given agent • Restrict motion to null space of Output gradient • Move opposite Cost gradient • Saturation If gradients are large -> Limits effect If Cost gradient is small -> small step If Output gradient is small -> ease null space restriction • Boundary constraint reduces step length • Minimum step for settling • Remove particles too close Quickly reduces population size Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsMechanics of the method Increasing yd Decreasing yd Form a cluster of optimal points • Change output by moving along the Output gradient • Repeat optimizing steps • Test for continuity/optimality Output changes in expected direction Not too far (discontinuity) Not too close (ill conditioned surface) Settled (optimality satisfied) Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsTesting of the method Linear/Quadratic Cost Periodic Cost Quadratic Cost Quadratic Cost Combination of functions Multiple extremum Saddle points 2-dim for verification Expected result Clusters between output extremum Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsTesting of the method Quadratic Cost Periodic-Linear output Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsPractical example • Robot control Problem • Precision is dependent on the pose • Radial precision is optimized via joint angles for varying radial distance • Planar Robot, Motoman HP-3: • Complex Robot, Motoman IA-20: Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsPractical example Links are shown by solid arrows. The effective length to the tip is shown by a dashed arrow. The arc showing the sensitivity for a joint is matched by color. Each link has a different radius to the tip and therefore a different sensitivity In addition, the direction of sensitivity is different The problem effectively finds the joint locations that reduce sensitivity in the radial distance Alan Jennings, Dissertation Defense, July 2012
Optimal Inverse FunctionsPractical example Output is adjusted as desired (additional task of finding angle of plane and the in-plane angle) Operator selects an inverse function
Optimal Inverse Functions • Method searches a large space efficiently by: • Having agents congregate to locally optimal solutions (increasing the effective search area of each), and • Eliminating neighboring points (once locations of optima are sketched out, less agents are needed). • Sets of continuous, optimal inverse functions • Can be used in real time, and • Reduces the burden on operator without reducing optimality Alan Jennings, Dissertation Defense, July 2012
My ContributionsAutonomous motion learning: • General purpose rigid body motion optimization • Provides novel high-level interface at the robot geometry level • Allows for novice roboticists or computers to design motions • However, has high computation requirements • Optimal inverse functions from a global search • Organizes motions in continuous, optimal inverse functions • Provides a set of reflexive responses for use online • Efficiently searches high dimension space using agents & local gradient • Improving motions by unbounded resolution • Nodes are added to an interpolation approaching optimal continuous function in the limit • Efficiently collects and “understands” experiences • Motions are not limited by initial programming resolution or initial training time limitation Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionHigh level concept Optimization Reflex Function Memory Model Operator or Higher Level Planner Cubic Interpolation System Reflex Function Cubic Interpolation System Memory Model To have continuous learning, must have unbounded resolution. Unbounded resolution leads to exponential growth in complexity Must make efficient use of experience Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionMechanics of the method Optimization Reflex Function Memory Model System Assumptions • t and a are bounded • y(a) and J(a) are in C2 and constant Operator or Higher Level Planner Cubic Interpolation System Reflex Function Cubic Interpolation System Memory Model Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionWhy cubic interpolation Adding node to cubic interpolation allows for all experiences to be transferred. Power series parameters are ill conditioned as the effective area of the basis approaches extremes Fourier series parameters typically create a less smooth optimization surface Radial basis function scaling parameter is either too small at low resolutions or large at high resolutions, and automatically changing it means data cannot be mapped exactly Sigmoid neural network parameters are large with respect to the input magnitude, resulting in poor optimization scaling. Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionWhy Locally weighted regression • Locally Weighted Regression performs a least-squared-error regression where the error is scaled by the distance to the test point. • local weighting allows global nonlinear behavior • Quadratic regression to accurately model optima • Provides gradient for optimization (and hessian) • Directions with insufficient data are identified from eigenvalues • Allows for autonomously determining which samples must be tested Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionTesting of the method Problem Design: Motivation Possibly internal resonance, Distance traveled, material processed, … Internal Limitations Flattens peaks in the absolute distance -> Minimize RMS Cost: (Distance to sine wave)2 J2=∫ (u(τ)-(sin(2π τ)+2)/4)2dτ Output: Average value y=∫ u(τ) dτ Saturation applied to u(t) Results Sinusoidal shape & Saturate at closer side Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionTesting of the method Alan Jennings, Dissertation Defense, July 2012 Waveform results Near optimal compared to direct optimization Exponential Learning Rate The results exploits saturation. Going from 4 to 9 nodes, the cost decreases but the shape appears identical by sight.
Unbounded ResolutionPractical example Unknown to Method Voltage out Amplifier Motor Sampled after the run, does not need to be sampled continuously Current Peak Detector Tachometer • Objective: • Control the motor voltage to spin the motor to a given speed at a set time with the minimum peak current. • Only modifications • Adjusted parameters for range of u, y & J • Increase measure of data required to deal with process variation • Ideal cost based on steady state Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionPractical example • Completely automated • Progressive improvement • Sizable variation • Direct optimization on an average of 10 trials still did not converge • However, LWR provided an sufficiently accurate estimate of the gradients to converge • Thirteen sets of data • Multiple runs gave similar results Alan Jennings, Dissertation Defense, July 2012
Unbounded ResolutionPractical example • 7 dim in 17 hours • About 40,000 samples • Method parameters were not optimized • Results make sense • Final voltage determines output • Initial voltage very similar • Initial slope flattens Alan Jennings, Dissertation Defense, July 2012
My ContributionsRelated publications and presentations: • Journal submissions • “Unbounded Motion Optimization by Developmental Learning ” Revision submitted to IEEE Systems, Man and Cybernetics Part B • “Optimal Inverse Functions Created via Population Based Optimization” Submitted to IEEE Systems, Man and Cybernetics Part B • Conference Presentations • “Memory-Based Motion Optimization for Unbounded Resolution” Computational Intelligence and Bioinformatics, IASTED, 753-31, Nov 2011 • “Population Based Optimization for Variable Operating Points” Congress on Evolutionary Computation, IEEE, Jun 2011 • “Constrained Near-Optimal Control Using a Numerical Kinetic Solver” Robotics and Applications, IASTED, 706-21, Nov 2010 • “Biomimetic Learning, Not Learning Biomimetics: A survey of developmental learning” National Aerospace and Electronics Conference (NAECON), IEEE, July 2010 • Posters • “Memory Based Optimization for Unbounded Learning” 2011 Great Midwest Regional Space Grant Consortia Meeting, also NASA Futures Form, Feb 2012. • “Constrained Near-Optimal Control Using a Numerical Kinetic Solver” 2009 Great Midwest Regional Space Grant Consortia, 3rd place Alan Jennings, Dissertation Defense, July 2012
My ContributionsAutonomous motion learning: • General purpose rigid body motion optimization • Provides novel high-level interface at the robot geometry level • Allows for novice roboticists or computers to design motions • However, has high computation requirements • Optimal inverse functions from a global search • Organizes motions in continuous, optimal inverse functions • Provides a set of reflexive responses for use online • Efficiently searches high dimension space using agents & local gradient • Improving motions by unbounded resolution • Nodes are added to an interpolation approaching optimal continuous function in the limit • Efficiently collects and “understands” experiences • Motions are not limited by initial programming resolution or initial training time limitation Alan Jennings, Dissertation Defense, July 2012
CommencementFuture applications: • Implementation for novel locomotion • Implement on an inch worm • Challenge is automating the tests, such as defining distance traveled • Would be very interesting to reduce variation • Learn control law for regulation • Develop control law for pendulum • Question of what disturbance to use and metric for cost or output (possibly response time, the operator sets the urgency) • Address multidimensional outputs • Robots are used to provide multiple outputs • A manifold of the output may not be represented in the output space (Think of a screw thread, despite moving continuously , there are multiple surfaces with the same horizontal coordinates). Alan Jennings, Dissertation Defense, July 2012