340 likes | 450 Views
COMP 650: POMDP’s real life applications. Rahul Kumar Department of Computer Science Rice University April 18, 2013. Long-term user intention prediction for wheelchair navigation using POMDP.
E N D
COMP 650: POMDP’s real life applications • Rahul Kumar • Department of Computer Science • Rice University • April 18, 2013
Long-term user intention prediction for wheelchair navigation using POMDP • References: Taha et al POMDP-based long-term user intention prediction for wheelchair navigation. • *image taken from http://robotzeitgeist.com/tag/dementia
Outline • Motivation • Introduction • POMDP quick review • Problem Specification / formulation • On Line assistance • Experimental results • Conclusions
Motivation Why to make wheel chairs smart ? • Growing number of aging population. • Increase in accidents or other calamities. • Terrible diseases which affect motor control
Reactive Wheelchairs • Reactive refers to systems that do not use representation of environment. • Most Popular among Intention recognition wheel chairs. • Rely on local or temporal information collected online. • Systems with limited power or processing power use this technique. • Examples: Rolland-III, NavChair etc.
POMDP - 1 • General framework for sequential decision making where states are hidden and actions are stochastic. • Widely used in assistive applications.
POMDP -2 • S – set of states • A – set of action • Z – set of observations • T – conditional transition probabilities S x A x S -> [0,1] • Z – conditional observation probabilities A x S x Z -> [0,1] • R: A x S -> real number
POMDP agent overview Observation Action Environment StateEstimator BeliefState Policy
POMDP Generation For efficient POMDP system , we need to have proper • State Space • Transition States • Observation States
State Space • Spatial States : Wheelchair location = {s1,s2,s3,… } • Destination states : Places of Interest = {d1,d2,..} • Joint representation of both of them = {s1d1,s2d1,…}
Transition model • Transition model specifies the probability of transition from one state to another given when a certain action is executed. • Actions= global navigation commands = {North, South, East, West, Stop } • Observation = Joystick movements = { Up, Down, Right, Left, NoInput} • Directly calculated from the map topology.
Observation model • We use training data from particular user. • In indoor settings, wheelchair user usually performs repetitive tasks. • For example, A task can be going from living room to kitchen etc.
Reward function • -1 for each action • +100 for an action that leads to Destination.
Experimental result -1 • Artificial data was generated based on the activity of user in the environment. • Zmdp software package was used. Zmdp package has several heuristic search algorithm for POMDPs and MDPs. • Known starting points but unknown destinations. • 100% success in predicting destination.
Conclusion • Employing POMDP for long term user intention prediction for wheel chair navigation. • No behavioral selection like other papers.
Future work • Enhance the capabilities and the intelligence of the system through automated activity monitoring and task extraction.
POMDP Hands * Image taken from http://matanyahorowitz.com/index.php
Overview • Motivation • Approach/Big Picture • Example/ Intution • Model Construction • Results
Motivation If you know all shapes and positions exactly, you can generate a trajectory that will work *Slide taken from Hsiao etal.
Problem at hand • How to decide on configuration of object when robot have to manipulate an object!
Approach/Big Picture • Partition Space : Identify and separate regions where we will have similar properties. • Reducing uncertainty in configuration by taking actions which acts as “funnels” i.e. mapping large sets of initial states to smaller set of resulting states. • We will work with set of guarded complaint motion. These actions acts as funnels.
Example Partial policy graph for robot
Abstract model construction • Action space: Two guarded complaint move commands for each degree of freedom. • Transition probability: Sample large number of triplets from given initial states. • Observation probability : Contact sensors have some uncertainty in determining contact. • Reward : 15 for reaching the goal, -50 for lifting in wrong configuration, -1 for each motion, -5 for being in unstable states or boundary states
Experiment • Similar to previous problem except that block is stepped • High fidelity simulation : 92% success, average reward: -1.59 • Fixed policy : 81% success, average reward= -10.632
Future work • To address problem with shape uncertainty. • To handle interaction with other objects,
References • Shio et al Grasping POMDP’s