1 / 23

Learning Modal Continuous Models

Learning Modal Continuous Models. Joseph Xu Soar Workshop 2012. Setting: Continuous Environment. Input to the agent is a set of objects with continuous properties Position, rotation, scaling, ... Output is fixed-length vector of continuous numbers Agent runs in lock-step with environment

delora
Download Presentation

Learning Modal Continuous Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Modal Continuous Models Joseph Xu Soar Workshop 2012

  2. Setting: Continuous Environment • Input to the agent is a set of objects with continuous properties • Position, rotation, scaling, ... • Output is fixed-length vector of continuous numbers • Agent runs in lock-step with environment • Fully observable Environment Agent B Output Input -9.0 A 5.8 A B 0.2 1.2 0.0 0.0 0.0 0.2 3.4 3.9 0.0 0.0 0.0 0.0 rx rx px px py pz py pz ry rz ry rz

  3. Levels of Problem Solving Characteristics Problem Solving Method Knowledge Required Faster Task Completion General Solutions Symbolic Model Symbolic Planning Symbolic Model Free Methods (RL) Symbolic Abstraction Continuous Sampling Methods (RRT) Continuous Model None Slower Task Completion Specific Solutions Motor Babbling Goal Recognition

  4. Continuous Model Learning • Learn a function • x: current continuous state vector • u: current output vector • y: state vector in next time step Continuous Output X U Y x u y

  5. Locally Weighted Regression ? Motor Command left voltage: -0.6 right voltage: 1.2 x u Weighted Linear Regression k nearest neighbors

  6. Problems with LWR Query • Euclidean distance doesn’t capture relational similarity • Averages over neighbors exhibiting different types of interactions Neighbor Neighbor Neighbor Neighbor

  7. Problems with LWR Query • Euclidean distance doesn’t capture relational similarity • Averages over neighbors exhibiting different types of interactions Neighbor Neighbor Prediction

  8. Modal Models • Object behavior can be categorized into different Modes • Behavior within a single mode is usually simple and smooth (inertia, gravity, etc...) • Behaviors across modes can be discontinuous and complex (collisions, drops) • Modes can often be distinguished by discrete spatial relationships between objects • Learn two-level models composed of: • A classifier that determines the active mode using spatial relationships • A set of linear functions (initial hypothesis), one for each model Mode 1 model Mode Classifier Scene Prediction Mode 2 model Mode 3 model

  9. Unsupervised Learning of Modes From Data Continuous Features Training Data 0.5, 1.1, -0.2, 4, 17 21.9 Environment Expectation Maximization Mode 1 Learned Mode 1 Mode 2 Learned Mode 2 time

  10. Expectation Maximization • Expectation Assuming your current model parameters are correct, what is the likelihood that the model m generated data point i? • Maximization Assuming each data point was generated by the most probable model, modify each model’s parameters to maximize likelihood of generating data • Iterate until convergence to local maximum

  11. Learning Classifier Scene Spatial Relations Training Data 0.5, 1.1, -0.2, 4, 17 21.9 Expectation Maximization attributes class B A 1000101011011 1 0101011010100 1 Learned Mode 1 left-of(A,B) = 1 right-of(A,B) = 0 on-top(A,B) = 0 touch(A,B) = 0 1100101100000 1 1010111010100 1 0010100010101 1 1110100010100 2 1000101011011 0001010100111 Learned Mode 2 2 1111010101010 2 1010100001001 2 1010101010011 1 0100110010101 1 time

  12. Learning Classifier Classifier Training Data touch(A, B) attributes class 1000101011011 1 1 0101011010100 1 0 1 1100101100000 1010111010100 1 left-of(A, B) mode 2 0010100010101 1 2 1110100010100 1 0 2 0001010100111 2 1111010101010 mode 2 mode 1 1010100001001 2 1 1010101010011 1 0100110010101 Use linear model for items in same model

  13. Prediction Accuracy Experiment • 2 Block Environment • Agent has two outputs (dx, dy) which control the x and y offsets of the controlled block at every times tep • The pushed block can’t be moved except by pushing it with the controlled block • Blocks are always axis-aligned, there’s no momentum • Training • Instantiate Soar agent in a variety of spatial configurations • Run 10 time steps, each step is a training example • Testing • Instantiate Soar agent in some configuration • Check accuracy of prediction for next time step

  14. Prediction Accuracy – Pushed Block

  15. Classification Performance

  16. Prediction Performance Without Classification Errors

  17. Levels of Problem Solving Characteristics Problem Solving Method Knowledge Required Faster Task Completion General Solutions Symbolic Model Symbolic Planning Symbolic Model Free Methods (RL) Symbolic Abstraction Continuous Sampling Methods (RRT) Continuous Model None Slower Task Completion Specific Solutions Motor Babbling Goal Recognition

  18. Symbolic Abstraction • Lump continuous states sharing symbolic properties into a single symbolic state • Should be Predictable • Planning requires accurate model (ex. STRIPS operators) • Tends to require more states, more symbolic properties • Should be General • Fast planning and transferrable solutions • Tends to require fewer states, fewer symbolic properties C1 C1 C1 S1 C1 S1: intersect(C1, C2) S2: ~intersect(C1, C2) C2 C1 C1 C1 S2 C1 C1 C1 C1

  19. Symbolic Abstraction • Hypothesis: contiguous regions of continuous space that share a single behavioral mode is a good abstract state • Planning within modes is simple because of linear behavior • Combinatorial search occurs at symbolic level • Spatial predicates used in continuous model decision tree are a reasonable approximation

  20. Abstraction Experiment • 3 blocks, goal is to push c2 to t • Demonstrate a solution trace to agent • Agent stores sequence of abstract states in solution in epmem • Agent tries to follow plan in analogous task • Abstraction should include predicates about c1, c2, t, avoid predicates about d1, d2, d3 t d2 C1 C2 d1 t C1 C2 d3 C2 d3 C2 C1 C1 d1 d2 C1 C1

  21. Generalization Performance 80 Tasks Total (16 average)

  22. Conclusions • For continuous environments with interacting objects, modal models are more general and accurate than uniform model • The relationships that distinguish between modes serve as useful symbolic abstraction over continuous state • All this work takes Soar toward being able to autonomously learn and improve behavior in continuous environments

  23. Evaluation Coal Nuggets Modal model learning is more accurate and general than uniform models Abstraction learning results are promising, but preliminary • Scaling issues: linear regression is exponential in number of objects • Linear modes is insufficient for more complex physics such as bouncing -> catastrophic failure

More Related