320 likes | 421 Views
King Fahd University of Petroleum and Minerals. COE 584/484: Robotics. Stochastic Optimization of Bipedal Walking using Gyro Feedback and Phase Resetting. Muhammad Al-Nasser Mohammad Shahab. March 2008 COE584: Robotics. Outline. Problem Definition Physical Description
E N D
King Fahd University of Petroleum and Minerals COE 584/484: Robotics Stochastic Optimization of Bipedal Walking using Gyro Feedback and Phase Resetting Muhammad Al-Nasser Mohammad Shahab March 2008 COE584: Robotics
Outline • Problem Definition • Physical Description • Humanoid Walking System • Feedback • Gyroscope • Phase Resetting • Stochastic Optimization • PGRL • Experimentation • Comments
Problem Definition • Authors • Felix Faber & Sven Behnke, Univ. of Freinbrg, Germany • Problem Statement: • “to optimize the walking pattern of a humanoid robot for forward speed using suitable metaheuristics”
First Humanoid Robot! 1206 AD Ibn Ismail Ibn al-Razzaz Al-Jazari A boat with four programmable automatic musicians that floated on a lake to entertain guests at royal drinking parties!!
Problem Definition Sensor Noise: Camera Gyroscope Ultrasonic Force … Inaccurate Actuators: Motors … Environment Disturbances: Unknown surface … Nonlinear Dynamics: i.e. complex system to control Problems?
Physical Description • Jupp, team NimbRo • 60 cm, 2.3 kg • Pocket PC
Physical Description • Pitch joint to bend trunk • Each leg • 3DOF hip • Knee • 2DOF ankle • Each arm • 2DOF shoulders • elbow
Humanoid Walking System Joints motor positions Controller Robot walks! Leg Motion Trajectory ’s • One Approach • Model-Based (Geometric Model) • Accurate Model • Solving motion equations for all joints (offline) • 19 Degrees of Freedom • Nonlinear model equations • Computational complexity
Humanoid Walking System Joints motor positions Controller ’s • Central Pattern Generators (CPG) • Sinusoid joint trajectory generated • Bio-Inspired • no need for model 2nd Approach
Humanoid Walking System • Open-loop (no feedback) Gait • Mechanism • Shifting weight from one leg to the other • Shortening the leg not needed • Leg motion in forward direction
Humanoid Walking System time - • Open-loop Gait • Clock-driven, Trunk phase being central clock • Trunk Phase (with ‘foot step frequency’ ) • Right leg motion phase =Trunk + /2 • Left leg motion phase = Trunk - /2
Humanoid Walking System Leg Left Kinematic Mapping Right Swing Foot “Human-Like Walking using Toes Joint and Straight Stance Leg” by Behnke Is leg extension Swingis leg swing amplitude r: Roll p: Pitch y: Yaw (continued)
Feedback Joints motor positions Mapping ’s Controller Gyroscope: Gyro = Inclination (Balance) Angular Velocity Force Sensing Resistors: foot touch ground trigger (‘High’ or ‘Low’) Overall Control System
Feedback • Gyroscope • device for measuring orientation, based on the principles of conservation of angular momentum • Remember Physics 101!
Feedback Joints motor positions ’s Gyro • P-Control • Gyro increase = robot fall • Proportional Control • reactive action proportionate to ‘error’ (Error = sensor value – desired value) • Desired values = zero (i.e. no inclination) • Other: Proportional-Integral Control • action proportionate to ‘error’ and proportionate to accumulation of ‘error’
Feedback Joints motor positions Mapping ’s P-Control Overall System
Feedback Joints motor positions Controller ’s Online Adaptation (Stochastic Optimization) • Adaptive Control • Online tuning of ‘parameters’ of the controller Overall System
Stochastic Optimization Approach • Goal: • Adjust parameters to achieve faster and more stable walk. • Fitness function (cost function) is used to express optimization goals (i.e. speed & robustness) f (.): RN--->R N: number of parameters of interest
Stochastic Optimization Approach • The parameters are Kinematic Mapping (Behnke paper)
Stochastic Optimization Approach • We evaluate f in a given set of parameters • x = [x1 , x2 , ... , xN] (Table 1) • Now, how to find the values of the parameters that will result in the highest fitness value? • use a metaheuristic method called PGRL ? +1 d <dexp
Policy Gradient Reinforcement Learning (PGRL) • An optimization method to maximize the walking speed • It automatically searches a set of possible parameters aiming to find the fastest walk that can be achieved
Policy Gradient Reinforcement Learning • How dose PGRL work? 1st: generates randomly B test polices {x1, x2,…, xB} • around an initially given set of parameter vector xπ • (where x = [x1 , x2 , … , xN]) • Each parameter in a given test policy xi is randomly set to • where 1≤i ≤B and 1 ≤j ≤N • ε is a small constant value
Policy Gradient Reinforcement Learning • 2nd: • the test policy is evaluated by ‘fitness function’. • For each parameter j is grouped into 3 categories • Which are • depending on where the jth parameter is modified by –ε, 0, +ε
Policy Gradient Reinforcement Learning • Next 3rd , construct vector a=[a1, a2, …, aN] • As are average of each category
Policy Gradient Reinforcement Learning • Then 4th(finally), adjust xπas follows where η is a scalar step size
Extension to PRLG • Adaptive step size after g steps: where s: the number of fitness functions evaluations S: maximum allowed number of s
Overall • Overall System Joints motor positions Controller ’s xπ PGRL
Results After 1000 iteration Initial • speed is 21.3 cm/s • fitness is 1.36 • Speed is 34.0 cm/s • Fitness is 1.52 60%
Glossary • Stance leg: • the leg which is on the floor during the walk. • Swing leg: • the leg which moving during the walk. • Single support: • The case where robot is touching the floor with one leg. • Double support: • The case where robot is touching the floor with both legs.