140 likes | 248 Views
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning. Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan. Reinforced learning.
E N D
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan
Reinforced learning • Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. • The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. • Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot • Robot learns through purposive behavior to achieve a given goal
Environment – Ball, Goal • Robot- Mobile and has a camera • Nothing about the system is known • Assume robot can discriminate the set S of states and take A actions on the world
Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate
State Set • 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)
Action set • Two motors • Each motor – forward, stop, back • 9 actions in all. • State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image
Learning from Early Missions • Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state • Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions
Complexity analysis • K states, m possible actions • Q-learning for first , for second hence • LEM m*k : Get reward at each step
Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to
When to shift • S1 is nearest to goal, next is S2 and so on. • Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors