200 likes | 362 Views
Using OpenRDK to learn walk parameters for the Humanoid Robot NAO. it’s me . F. Giannone. A. Cherubini. L. Iocchi. M. Lombardo. G. Oriolo. Overview: environment. Robotic Agent. Humanoid Robot. NAO. Produced by Aldebaran. Application. Robotic Soccer. SDK. Simulator.
E N D
UsingOpenRDKtolearn walk parameters fortheHumanoid Robot NAO it’s me F. Giannone A. Cherubini L. Iocchi M. Lombardo G. Oriolo
Overview:environment Robotic Agent • Humanoid Robot NAO • Produced by Aldebaran Application Robotic Soccer SDK Simulator
Overview:(sub)tasks At First !!! Modelling Module Vision Module Elaborate raw data to obtain more reliable information Process raw data from environment Environment Actuate robot motors accordindly Decide the best behaviour to accomplish the agent goal Behaviour Control Module Motion Control Module At First !!!
Make Nao walk…how? For these reasonswe decided to develop our walk model and to tune it using machine learnig tecniques • called through an interface(NaoQi Motion Proxy) Nao is equipped with a set of motion utilities including a walk implementation that can be • partially customized by tuningsome parameters Main Advantage • Ready to Use (…to be tuned) …and a Drawback • Based on an unknow Walk Model No flexibility at all!!!
SPQR Walking library development workflow SPQR Walk Model Test the walk model on Webots simulator Develop the Walk model using Matlab Design and Implement a C++ library for our RDK Soccer Agent on Webots simulator SPQR Walking Library Test our Walking RDK Agent on real NAO robot Finally tunewalk parameters (on webots simulator and on NAO)
A simple walking RAgent for Nao Switches between two states: walk - stand Simple Behaviour Module Motion Control Module SPQR Walking Library uses NaoQi Adaptor Webots Client Smemy TCP channel NAO (NaoQi) WEBOTS
SPQR Walking Engine Model 21 degrees of freedom NAO model characteristics No actuated trunk No dynamic model available We follow the “Static Walking Pattern”: Use a-priori definition of the desired trajectories defined by: Choose a set of variable output: 3D coordinates of selected pointsof the robot Choose and parametrize the desiredtrajectories for these variables at each phase of the gait • Velocity Commands (v,ω) • v is linear velocity • ω is angolar velocity
SPQR velocity commands (v,0) (v,0) Initial Half Step Rectilinear Walk Swing Stand Position Behavior Control Module (v,ω) (0,0) (v,ω) (v,ω) Curvilinear Walk Swing Motion Control Module (0,ω) Turn Step (0,0) Joints Matrix (0, ω) Final Half Step
SPQR walking subtasks and parameters Biped walking Swing phase Double support phase SS% SPQR walk subtasks Foot trajectories inthe xz plane Arm control Hip yaw/pitchcontrol (turn) Center of masstrajectory in lateraldirection Xtot, Xsw0, XdsZst, Zsw Ks Hyp Yft, Yss, Yds, Kr
Walk tuning: main issues • Possible choices • By hand • By using machine learning techniques • Machine Learning seems the best solution • Less human interaction • Explores the search space in a more systematic way • …but take care of some aspects • You need to define an effective fitness function • You need to choose the right algorithm to explore the parameter space • Only a limited amount of experiments can be done on a real robot
Webots Real Nao SPQR Learning System Architecture Learning library uses Learner Iterationexperiments Fitness (GPS) RAgent Datato evaluatethe fitness uses Walking library
SPQR Learner Learner Policy Gradient(e.g., PGPR) Firstiteration? No Apply the chosenalgorithm (strategy) Nelder MeadSimplex Method Genetic Algorithm Yes Return initial Iteration and iteration information Return next Iteration and iteration information
Policy Gradient (PG) iteration *= normalized() p’=p+* Given a point p inthe parameter space IRK Generate n (n=mk) policiesfrom p (for each componentof p: pi ,pi+, or pi-) For each k {1, …, K}, if F0 > F+ and F0 > F- then k=0 else k= F+ -F- For each k {1, …, K}, compute Fk+, Fk0, Fk- Evaluate the policies
Enhancing PG: PGPR At each iteration i, the gradient estimate (i)can be used to obtain a metric for measuring the relevance of the parameters. forgetting factor Given the relevance and a threshold T, PGPR prunes less relevant parametersin next iterations.
Curvilinear biped walking experiment • The robot move along a curve with radius R for a time t Fitness function: In which: path length radial error
Simulators in learning tasks • Advantages • You can test the gait model and the learning algorithm without being biased by noise • Limits • The results of the experiments on the simulator can be ported on the real robot, but specialized solutions for the simulated model can be not so effective on the real robot (e.g., it does not take into account asymmetries, models are not very accurate)
Results (1) • Five sessions of PG, 20 iterations each, all starting from the same initial configuration • SS%, Ks, Yft have been set to hand-tuned values • 16 policies for each iteration • Fitness increasesin a regular way • Low varianceamong the fivesimulations
Results (2) Final parameter setsfor the five PG runs Zsw Xsw0 Xs Kr Five runs of PGPR
Bibliography • A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. “Policy Gradient Learning for a Humanoid Soccer Robot”. Accepted for Journal of Robotics and Autonomous Systems. • A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, “An extended policy gradient algorithm for robot task learning”, Proc. of IEEE/RSJ International Conference on Intelligent Robots and System, 2007. • A. Cherubini, F. Giannone, and L. Iocchi, “Layered learning for a soccer legged robot helped with a 3D simulator”, Proc. of 11th International Robocup Symposium, 2007. • http://openrdk.sourceforge.net • http://www.aldebaran-robotics.com/ • http://spqr.dis.uniroma1.it
??? ??? Any Questions ??? ???