300 likes | 407 Views
CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment. Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML). OUTLINE. Project Overview Analytical results Maze Domain Experiments Results Conclusions and Future Work. MDP environment: Maze domain.
E N D
CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)
OUTLINE • Project Overview • Analytical results • Maze Domain • Experiments • Results • Conclusions and Future Work
MDP environment:Maze domain • states • actions • transitions between states • immediate rewards • Markov property
MDP environment:Maze domain • states • actions • transitions between states • immediate rewards • Markov property • optimal value V* of each state
Project Overview • Analyze MDP/POMDP domain in the presence of: • State Abstraction • Errors in state transition function • Errors in V* function due to • State Abstraction • Machine Learning • Evaluate effectiveness of lookahead search policy in the presence errors.
Questions • When is the problem MDP ? • if not MDP: can we recast the Markov property? • limited lookahead: does it helps?
No state abstraction,imperfect value function V • MDP • V now can be used as a heuristic • limited lookahead: usually admissible heuristic function • Combining lookahead with learning: • Learning Real Time A* • Real Time Dynamic Programming
“Abstracted” value function • We know where we are but the value function is the same for all states in abstracted state G
“Abstracted” value function In given abstracted state: value is the average over V* of all states in the abstracted state • not admissible • lookahead may help you to get outside the abstraction boundary
Does lookahed always help? G Depth 1
Does lookahed always help? G Depth 1
Does lookahed always help? G Depth 1
Does lookahed always help? G Depth 3
Does lookahed always help? G Depth 3
Does lookahed always help? G Depth 3
Does lookahed always help? G Depth 3
State abstraction • not Markovian • special case of POMDP • transition from one abstracted state to another and rewards depend on a history • some special cases when it are Markovian
How to recast Markov property? • If we know underlying MDP: updating belief over states Fully observed MDP in belief space • solve the belief MDP • use V* of underlying states as heuristic • Real-Time Dynamic Programming in belief space
How to recast Markov property? • If we do not know the underlying MDP: use the history as part of a state description How long path do we need to use? In general: the whole history Special cases: only part
Error in transition function • Can be crucial • Agent can be easily trapped in loops
Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left:
100% 100% Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left: real:
100% 100% Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left: real: what we think: 35% 65% 65% 35%
Experimental Setup • 48x48 cell maze • 3 Experiments • State Abstraction • Machine Learning (ANN) • State Abstraction and Machine Learning • Error Measurements • Relative Score (global policy error) • Distance to goal (sample score error)
State Abstraction Error(s) • Abstraction Tile size varied • k = 1, 2, 3, 4, 6, 8, 12, 24, 48 • Ply Depth 1 – 7 @ 10 games/ply depth
Machine Learning Error • 2 – h – 100 ANN, inputs (x,y), out V*(s) • Error achieved by varying the number of hidden nodes (h) within a NN (1-20)
Conclusion Most important results: • analysis of lookahead for “abstracted” value function: especially experimentally • demonstration of possible adverse effects of errors in transition function • answers for questions about Markov property and investigation of ways to restore it
Future Work • Improve Policy Error Evaluation Measures • Further analytical work on lookahead