290 likes | 454 Views
Machine Learning and Motion Planning. Dave Millman October 17, 2007. Machine Learning intro. Machine Learning (ML) The study of algorithms which improve automatically though experience. - Mitchell General description Data driven Extract some information from data Mathematically based
E N D
Machine Learning and Motion Planning Dave Millman October 17, 2007
Machine Learning intro • Machine Learning (ML) • The study of algorithms which improve automatically though experience. - Mitchell • General description • Data driven • Extract some information from data • Mathematically based • Probability, Statistics, Information theory, Computational learning theory, optimization
A very small set of uses of ML • Text • Document labeling, Part of speech tagging, Summarization • Vision • Object recognition, Hand writing recognition, Emotion labeling, Surveillance • Sound • Speech recognition, music genra classification • Finance • Algorithmic trading • Medical, Biological, Chemical, on and on and on…
A few types of ML • Supervised • Given: labeled data • Usual goal: learn function • Ex: SVM, Neural Networks, Boosting etc. • Unsupervised • Given: unlabeled data • Usual goal: cluster data, learn conditional probabilities • Ex: Nearest Neighbors, Decision trees
A few types of ML (cont.) • Semi-Supervised • Given: labeled and unlabeled data • Usual goal: use unlabeled data to increase labeled data • Ex: Cluster, Label unlabeled data from clusters • Reinforcement • Given: Reward function and set of actions • Goal: Learn a function which optimizes the reward function • Ex: Q-learning , Ant-Q
General Idea Reinforcement • Markov Decision Process (MDP) • State space (fully or partially observable) • Action space (static or time dependant) • Transition function produces an action (based the present state, not the past) • Reward function (based on action)
Text book Q-Learning [MI06] • Learning flocking behavior • N agents • discrete time steps • Agent i partner j • Define Q-state Q(st, at) • st - state • ai - action
Our text book example • State of i • [R] = floor(|i-j|) • Actions for i • a1 - Attract to j • a2 - Parallel positive orientation to j • a3 - Parallel negative orientation to j • a4 - Repulsion from j
Reward Function - no predator • Distances R1, R2, R3s.t. R1 < R2 < R3
Reward Function - predator • Distances R1, R2, R3s.t. R1 < R2 < R3
Don’t repeat work!! • Basic planners work from scratch • Ex, path planning for parking, no difference between first time and the hundredth time • Ideal learn some general higher level “strategies” that can be reused • General solution patterns in the problem space
Viability Filtering [KP07] • Agent can “see”, perceptual information • Range finder like virtual sensors • Data base of successfully perceptually-parameterized motions • From its own experimentation or external source • Database exploited for future queries • Search based off of what has previously been successful in similar situations.
Sensors in Viability Filtering some defs • X set of agent states • Eset of environmentstates • def x+ \in X+ {x+=(x,e) | x \in X, e \in E} • Sensor function (x+): X+ R • At a specific sensor state x+ \in X+ • def sensor state s =(1(x+), …, n(x+)) • And sensor space s \in S where S all sensor state values
Finally • Def locally situated state of the agent = (s,x’) \in where x’ is some state information independent of the sensory agent. • Now we want collect data to train a function (): {viable, nonviable} • Note, errors in () could cause problems
Check Viability not Collision Function IS_NONVIABLE(x+) if is_collision(x+) then return True s := (1(x+), …, n(x+)) x’ := extract_internal_state(x+) := (s, x’) return ¬():
Results and Further work • Bootstrapping • Use of history to create macroscopic plans • Model transfer
Training a Dog [B02] • MIT lab - System where the user interactively train the dog using “click training” • Uses acoustic patterns as cues for actions • Can be taught cues on different acoustic pattern • Can create new actions from state space search • Simplified Q-learning based on animal training techniques
Training a Dog (cont.) • Predictable regularities • animals will tend to successful state • small time window • Maximize use of supervisor feedback • limit the state space by only looking at states that matter, ex if utterance u followed by action a produces a reward then utterance u is important. • Easy to train • Credit accumulation • And allowing state action pair to delegate credit to another state action pain.
Alternatives to Q-Learning • Q-decomp [RZ03] • Complex agent as set of simpler subagents • Subagent has its own reward function • Arbitrator decides best actions based on “advice” from subagents • A simple world with initial stateS0 and three terminal statesSL,SU,SR, each with an associated reward of dollars and/or euros. The discount factor is γ ∈ (0, 1). [fig from. RZ03]
Learning Behavior with Q-Decomp [CT06] • Q-Decomp as the learning technique • Reward function - Inverse Reinforcement Learning (IRL) [NR00] • Mimicking behavior from an “expert”
Support Vector Path Planning • Idea that uses the SVM algorithm to generate a smooth path. • Not really Machine learning but neat application of a ML algortihm • Here is the idea
Videos • Robot learning to pick up objects • http://www.cs.ou.edu/~fagg/movies/index.html#torso_2004 • Training a Dog • http://characters.media.mit.edu/projects/dobie.html
References • [NR00] A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In Proc. 17th International Conf. on Machine Learning, pages 663-670. Morgan Kaufmann, San Francisco, CA, 2000. • [B02] B. Blumberg et al. Integrated learning for interactive synthetic characters. In SIGGRAPH ‘02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 417-426, New York, NY, USA, 2002. ACM Press. • [RZ03] S. J. Russell and A. Zimdars. Q-decomposition for reinforcement learning agents. In ICML, pages 656-663, 2003 • [MI06] K. Morihiro, Teijiro Isokawa, Haruhiko Nishimura, Nobuyuki Matsui, Emergence of Flocking Behavior Based on Reinforcement Learning, Knowledge-Based Intelligent Information and Engineering Systems, pages 699-706, 2006 • [CT06] T. Conde and D. Thalmann. Learnable behavioural model for autonomous virtual agents: low-level learning. In AAMAS ‘06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 89-96, New York, NY, USA, 2006. ACM Press. • [M06] J. Miura. Support vector path planning. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 2894-2899, 2006. • [KP07] M. Kalisiak and M. van de Panne. Faster motion planning using learned local viability models. In ICRA, pages 2700-2705, 2007.
Machine Learning Ref [M07] Mehryar Mohri - Foundations of Machine Learning course notes http://www.cs.nyu.edu/~mohri/ml07.html [M97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997 RN05] Russell S, Norvig P (1995) Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey [CV95] Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20, 1995. [V98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998. [KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.