190 likes | 391 Views
Developmental Mechanisms for Life-Long Autonomous Learning in Robots. Pierre-Yves Oudeyer Project-Team INRIA-ENSTA- ParisTech FLOWERS http:// www.pyoudeyer.com http:// flowers.inria.fr. Developmental robotics. Sensorimotor and social learning: Autonomous Open, « life-long learning »
E N D
Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS http://www.pyoudeyer.com http://flowers.inria.fr
Developmental robotics • Sensorimotor and social learning: • Autonomous • Open, « life-long learning » • Real world, physical and social • Experimental validation Fundamental understanding of the mechanisms of development Application to assistive robotics • Intrinsic Motivation • Maturation • Imitation, • Social guidance
“Engineered” robot learning « Real » world Developmental approach • Engineer shows, with fixed interaction protocol in the lab: • Target: • Engineer provides a reward/fitness function: Axe 2 Axe 1 Which generic reward function for spontaneous curiosity driven learning? Behaviour of human (non-engineer) OR Regression algorithms (e.g. LGP, LWPR, Gaussian Mixture Regression) ? • Target: ? Action policy State/ context Action • Optimization algorithms (e.g. NAC, non-linear Nelder-Mead, …)
Learning from interactions with non-engineers • Intuitive multimodal interfaces • Synthesis and recognition of emotion in speech (IJHCS, 2001, 5 patents) • Clicker-training (RAS, 2002; 1 patent) • Physical human-robot interfaces (Humanoids 2011) • User studies (Humanoids 2009, HRI 2011) • Adaptation: learning flexible teaching interfaces(Conn. Sci., 2006, ICDL 2011, IROS 2010) Non-engineer human behaviour ?
Spontaneous active exploration, artificial curiosity Intrinsic Motivation Berlyne (1960), Csikszentmihalyi (1996) Dayan and Belleine (2002) Non ! Quelle fonction de récompense générique ? in the vicinity of • Non-stationary function, difficult to model • Algorithms for empirical evaluation of de/dt with statistical regression • IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010) • McSAGG-RIAC (2011), SGIM (2011)
Exploring and learning generalized forward and inverse models Parameterized by Parameterized by
Active learning of models • Explore zones where: • Uncertainty/errors maximal • Least explored • Assume: • Spatial or temporal stationarity • Everythingislearnablewithinlifetime complexe simple complexe Developmental approach Explore zones where empirically learning progress is maximal complexe simple • Which • experiment ? complexe
Sensori state at t+1 IAC, IEEE Trans. EC (2007) R-IAC, IEEE Trans. AMD (2009) Sensori state Classic machine learner M (e.g. neural net, SVM, Gaussianprocess) Prediction Context state Error feedback Action state Meta machine learner metaM Local model of learning progress Local model of learning progress Action selection system . . . Intrinsicreward Progressive categorization
The Playground Experiments (IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)
Experimentationson Open Learning in the Real World • PlaygroundExperiments • Autonomouslearning of novel affordances and and skills, e.g. object manipulationIEEE TEC, 2007; IROS 2010; IEEE TAMD, 2009; Front. Neurorobotics, 2007; Connect. Sc., 2006; IEEE ICDL 2010, 2011 • Self-organization of developmental trajectories, bootstrapping of communication New hypotheses for understanding infant developmentFront. Neuroscience 2007, Infant and Child Dev. 2008, Connect. Science 2006 complex simple complex
Active learning of inverse modelsSAGG-RIAC(RAS, 2012) From the active choice of action, followed by observation of effect … Redundancy of sensorimotor spaces … to the active choice of effect, followed by the search of a corresponding action policy through goal-directed optimization (e.g. using NAC, POWER, PI^2-CMA, …) self-defined RL problem (Context, Movement) Effect Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of:
Experimentalevaluation of active learningefficiency Apprentissage de la locomotion omnidirectionnelle Control Space: Task Space: • Performance higherthan more classical active learningalgorithms in real sensorimotorspaces (non-stationary, non homogeneous) (IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012)
Maturational constraints • Progressive growths of DOF number and spatio-temporal resolution • Adaptive maturationalschedulecontrolled by active learning/learningprogress (Bjorklund, 1997; Turkewitz and Kenny, 1985)
McSAGG-RIAC Maturationally constrained curiosity-driven learning (IEEE ICDL-Epirob 2011a)
SGIM: SociallyGuidedIntrinsic Motivation (ICDL-Epirob, 2011b)
« Life-long » Experimentation • Experimentation of algorithms for « life-long » learning in the real world • Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap Acroban (Siggraph 2010, IROS 2011, World Expo, South Korea, 2012)
« Life-long » Experimentation • Experimentation of algorithms for « life-long » learning in the real world • Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap • Ergo-Robots • Mid-term: open-source distribution of the platform to the scientific community Ergo-Robots (Exhibition « Mathematics, a beautiful elsewhere », Fond. Cartier, 2011-2012)
Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems. http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf Baranes, A., Oudeyer, P-Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/BaranesOudeyerICDL11.pdf Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009. http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265--286. http://www.pyoudeyer.com/ims.pdf Baranes, A., Oudeyer, P-Y. (2009 )R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155--169. Ly, O., Lapeyre, M., Oudeyer, P-Y. (2011) Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress, Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. http://flowers.inria.fr/mlopes/myrefs/12-nips-zeta.pdf The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre-Yves Oudeyer. In Proceedings of IEEE ICDL-Epirob 2012, http://flowers.inria.fr/mlopes/myrefs/12-ssp.pdf