1 / 80

Connecting Learning and Logic

Connecting Learning and Logic. Eyal Amir U. of Illinois, Urbana-Champaign Joint work with: Dafna Shahaf, Allen Chang. Problem: Learn Actions’ Effects. Given : a sequence of observations over time Example: Action a was executed Example: State feature f has value T

ormand
Download Presentation

Connecting Learning and Logic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connecting Learning and Logic Eyal Amir U. of Illinois, Urbana-Champaign Joint work with: Dafna Shahaf, Allen Chang Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  2. Problem: Learn Actions’ Effects • Given: a sequence of observations over time • Example: Action a was executed • Example: State feature f has value T • Want: an estimate of actions’ effect model • Example: a is executable if the state satisfies some property • Example: under condition _, a has effect _ Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  3. Example: Light Switch Time Action Observe (after action) Posn. Bulb Switch 0 E ~up 1 go-W ~E ~on 2 sw-up ~E ~on FAIL 3 go-E E ~up 4 sw-up E up 5 go-W ~E on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  4. Example: Light Switch State 1 State 2 west east west east ~up ^ ~on ^ E  up ^ on ^ E • Flipping the switch changes world state • We do not observe the state fully ~up up ~on on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  5. Motivation: Exploration Agents • Exploring partially observable domains • Interfaces to new software • Game-playing/companion agents • Robots exploring buildings, cities, planets • Agents acting in the WWW • Difficulties: • No knowledge of actions’ effects apriori • Many features • Partially observable domain Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  6. Rest of This Talk • Actions in partially observed domains • Efficient learning algorithms • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  7. Knowledge state k1 k2 k3 k4 Action a1 a2 a3 a4 World state s1 s2 s3 s4 Learning Transition Models Transition Knowledge • Learning: Update knowledge of the transition relation and state of the world Transition Relation 3 1 3 2 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  8. 1 3 T2+ T1+ 2 T2+ 1 3 T2+ T3+ 2 1 T3+ T3+ Action Model:<State,Transition> Set 1 T1+ 2 T1+ T2+ 2 1 T3+ 2 T3+ Problem: n world features  2^(2^n) transitions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  9. Rest of This Talk • Actions in partially observed domains • Efficient algorithms • Updating a Directed Acyclic Graph (DAG) • Factored update (flat formula repn.) • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  10. Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  11. Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae • Actions = propositional symbols assert effect rules • “sw-up causes on ^ up if E” • “go-W keeps up” (= “go-W causes up if up” …) • Prop symbol: go-W≈up, sw-uponE, sw-upupE Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  12. Updating the Status of “Locked” Time 0 tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  13. Updating the Status of “Locked” Time t expl(t) tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  14. Updating the Status of “Locked” Time t+1 expl(t+1)    ........  ¬ ........ ¬  expl(t) “locked” holds because PressB did not change it “locked” holds because PressB caused it tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  15. Algorithm: Update of a DAG • Given: action a, observation o, transition-belief formula φt • for each fluent f, • kb:= kb Λ logic formula “a is executable” • expl'f := logical formula for the possible explanations for f’s value after action a • replace every fluent g in expl’f with a pointer to explg • update explf := expl'f • φt+1 is result of 2 together with o Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  16. Fast Update: DAG Action Model • DAG-update algorithm takes constant time (using hash table) to update formula • Algorithm is exact • Result DAG has size O(Tnk+|φ0|) • T steps, n features, k features in action preconditions • Still only n features/variables • Use φt with a DAG-DPLL SAT-solver Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  17. Experiments: DAG Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  18. Experiments: DAG Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  19. Experiments: DAG Queries Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  20. Rest of This Talk • Actions in partially observed domains • Efficient algorithms • Updating a Directed Acyclic Graph (DAG) • Factored update (flat formula repn.) • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  21. Distribution for Some Actions • Project[a](jÚy) º Project[a](j) Ú Project[a](y) • Project[a](jÙy) º Project[a](j) Ù Project[a](y) • Project[a](Øj) º ØProject[a](j) Ù Project[a](TRUE) • Compute update for literals in the formula separately, and combine the results • Known Success/Failure • 1:1 Actions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  22. Project[a](jÙy) º Project[a](j) Ù Project[a](y) Actions that map states 1:1 • Reason for distribution over Ù : Project[a](jÙy) º Project[a](j) Ù Project[a](y) 1:1 Non-1:1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  23. Algorithm: Factored Learning • Given: action a, observation o, transition-belief formula φt • Precompute update for every literal • Decompose φt recursively, update every literal separately, and combine the results • Conjoin the result of 2. with o, producing φt+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  24. Fast Update of Action Model • Factored Learning algorithm takes time O(|φt|) to update formula • Algorithm is exact when • We know that actions are 1:1 mappings between states • Actions’ effects are always the same • Otherwise, approximate result: includes exact action model, but also others • Resulting representation is flat (CNF) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  25. Compact Flat Representation: How? • Keep some property of invariant, e.g., • K-CNF (CNF with k literals per clause) • #clauses bounded • Factored Learning: compact repn. if • We know if action succeeded, or • Action failure leaves affected propositions in a specified nondeterministic state, or • Approximate: We discard large clauses (allows more states) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  26. Compact Representation in CNF • Action affects and depends on ≤k features  |φt+1| ≤|φt|·nk(k+1) • Actions always have same effect  |φt+1| ≤ O(t·n) • If also every feature observed every ≤k steps  |φt+1| ≤ O(nk+1) • If (instead) the number of actions ≤k  |φt+1| ≤ O(n·2klogk) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  27. Experiments: Factored Learning Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  28. Summary • Learning of effects and preconditions of actions in partially observable domains • Showed in this talk: • Exact DAG update for any action • Exact CNF update, if actions 1:1 or w/o conditional effects • Can update model efficiently without increase in #variables in belief state • Compact representation • Adventure games, virtual worlds, robots Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  29. Innovation in this Research • First scalable learning algorithm for partially observable dynamic domains • Algorithm (DAG) • Always exact and optimal • Takes constant update time • Algorithm (Factored) • Exact for actions that always have same effect • Takes polynomial update time • Can solve problems with n>1000 domain features (>21000 states) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  30. Current Approaches and Work • Reinforcement Learning & HMMs • [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00] • Maintain probability distribution over current state • Problem: Exact solution intractable for domains of high (>100) dimensionality • Problem: Approximate solutions have unbounded errors, or make strong mixing assumptions • Learning AI-Planning operators • [Wang ’95], [Benson ’95], [Pasula etal. ’04],… • Problem: Assume fully observable domain Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  31. Open Problems • Efficient Inference with learned formula • Compact, efficient stochastic learning • Average case of formula size? • Dynamic observation models, filtering in expanding worlds • Software: http://www.cs.uiuc.edu/~eyal Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  32. Acknowledgements • Dafna Shahaf • Megan Nance • Brian Hlubocky • Allen Chang • … and the rest of my incredible group of students Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  33. THE END Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  34. Talk Outline • Actions in partially observed domains • Representation and update of models • Efficient algorithms • Related Work & Conclusions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  35. Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  36. Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae • Actions = propositional symbols assert effect rules • “sw-up causes on ^ up if E” • “go-W keeps up” (= “go-W causes up if up” …) • Prop symbol: go-W≈up, sw-uponE, sw-upupE Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  37. Example: Light Switch • Initial belief state (time 0) = set of pairs: { <E,~on,~up>, <E,on,~up>}all transition rels. Space = O(2^(2^n)) • New encoding: E  ~up Space = 2 • Question: how to update new representation? Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  38. Updating Action Model • Transition belief state represented by j • Action-Definition(a)t,t+1 Ù((at Ù (af v (affÙ ft))  ft+1) fÙ (at Ù ft+1 (af v (affÙ ft))) (effect axioms + explanation closure) • Update: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  39. Example Update: Light Switch • Transition belief state: jt = Et ~upt • Project[sw-on](jt) = (Et+1 sw-onEE sw-onE )  (~upt+1 sw-on~up~up sw-on~up)  … • Update: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  40. Updating Action Model • Transition belief state represented by j • jt+1 = Update[o](Project[a](jt)) • Actions: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 • Observations: Update[o](j) = jÙo Theorem: formula filtering equivalent to <transition,state>-set semantics Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  41. Commonsense extraction Decision Making Module Knowledge Base Interface Module World Model Learning Module Filtering Module Larger Picture:An Exploration Agent Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  42. Example: Light Switch • Initial belief state (time 0) = set of pairs: { <E,~on,~up>, <E,on,~up>}all transition rels. • Apply action a = go-W . • Resulting belief state (after action) • { <E,~on,~up> } x { transitions map to same state } • { <E,on,~up> } x { transitions map to same state } • { <~E,~on,~up> } x { transitions set position to ~E } • …. Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  43. Example: Light Switch • Resulting belief state (after action) • { <E,~on,~up> } x { transitions map to same state } • { <E,on,~up> } x { transitions map to same state } • { <~E,~on,~up> } x { transitions set position to ~E } • …. • Observe: ~E, ~on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  44. Experiments w/DAG-Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  45. Some Learned Rules • Pickup(b1) causes Holding(b1) • Stack(b3,b5) causes On(b3,b5) • Pickup() does not cause Arm-Empty • Move(room1,room4) causes At(book5,room4) if In-Briefcase(book5) • Move(room1,room4) does not cause At(book5,room4) if ¬In-Briefcase(book5) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  46. Approximate Learning • Always  result of Factored-Learning ( φt ) includes exact action model • Same compactness results apply • Approximation decreases size: Discard clauses >k (allows more action models),  |φt| = O(n^k) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  47. More in the Paper • Algorithm that uses deduction for exact updating the model representation always • Arbitrary preconditions and conditional effects • Formal justification of algorithms and complexity results Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  48. Experiments Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  49. DAG-SLAF: The Algorithm Input: a formula φ , an action-observation sequence <ai,oi> , i=1..t Initialize: for each fluent f, explf := initfkb:= φ , where each f is replaced by initf <example here?> Process Sequence: for i=1..t do Update-Belief(ai,oi) return kb Λ base Λ (f ↔ explf ) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

  50. Current Game + Translation • LambdaMOO • MUD code base • Uses database to store game world, • Emphasis on player-world interaction • Powerful in-game programming language • Game sends agents logical description of world Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

More Related