800 likes | 945 Views
Connecting Learning and Logic. Eyal Amir U. of Illinois, Urbana-Champaign Joint work with: Dafna Shahaf, Allen Chang. Problem: Learn Actions’ Effects. Given : a sequence of observations over time Example: Action a was executed Example: State feature f has value T
E N D
Connecting Learning and Logic Eyal Amir U. of Illinois, Urbana-Champaign Joint work with: Dafna Shahaf, Allen Chang Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Problem: Learn Actions’ Effects • Given: a sequence of observations over time • Example: Action a was executed • Example: State feature f has value T • Want: an estimate of actions’ effect model • Example: a is executable if the state satisfies some property • Example: under condition _, a has effect _ Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Example: Light Switch Time Action Observe (after action) Posn. Bulb Switch 0 E ~up 1 go-W ~E ~on 2 sw-up ~E ~on FAIL 3 go-E E ~up 4 sw-up E up 5 go-W ~E on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Example: Light Switch State 1 State 2 west east west east ~up ^ ~on ^ E up ^ on ^ E • Flipping the switch changes world state • We do not observe the state fully ~up up ~on on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Motivation: Exploration Agents • Exploring partially observable domains • Interfaces to new software • Game-playing/companion agents • Robots exploring buildings, cities, planets • Agents acting in the WWW • Difficulties: • No knowledge of actions’ effects apriori • Many features • Partially observable domain Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Rest of This Talk • Actions in partially observed domains • Efficient learning algorithms • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Knowledge state k1 k2 k3 k4 Action a1 a2 a3 a4 World state s1 s2 s3 s4 Learning Transition Models Transition Knowledge • Learning: Update knowledge of the transition relation and state of the world Transition Relation 3 1 3 2 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
1 3 T2+ T1+ 2 T2+ 1 3 T2+ T3+ 2 1 T3+ T3+ Action Model:<State,Transition> Set 1 T1+ 2 T1+ T2+ 2 1 T3+ 2 T3+ Problem: n world features 2^(2^n) transitions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Rest of This Talk • Actions in partially observed domains • Efficient algorithms • Updating a Directed Acyclic Graph (DAG) • Factored update (flat formula repn.) • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae • Actions = propositional symbols assert effect rules • “sw-up causes on ^ up if E” • “go-W keeps up” (= “go-W causes up if up” …) • Prop symbol: go-W≈up, sw-uponE, sw-upupE Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Updating the Status of “Locked” Time 0 tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Updating the Status of “Locked” Time t expl(t) tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Updating the Status of “Locked” Time t+1 expl(t+1) ........ ¬ ........ ¬ expl(t) “locked” holds because PressB did not change it “locked” holds because PressB caused it tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Algorithm: Update of a DAG • Given: action a, observation o, transition-belief formula φt • for each fluent f, • kb:= kb Λ logic formula “a is executable” • expl'f := logical formula for the possible explanations for f’s value after action a • replace every fluent g in expl’f with a pointer to explg • update explf := expl'f • φt+1 is result of 2 together with o Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Fast Update: DAG Action Model • DAG-update algorithm takes constant time (using hash table) to update formula • Algorithm is exact • Result DAG has size O(Tnk+|φ0|) • T steps, n features, k features in action preconditions • Still only n features/variables • Use φt with a DAG-DPLL SAT-solver Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Experiments: DAG Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Experiments: DAG Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Experiments: DAG Queries Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Rest of This Talk • Actions in partially observed domains • Efficient algorithms • Updating a Directed Acyclic Graph (DAG) • Factored update (flat formula repn.) • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Distribution for Some Actions • Project[a](jÚy) º Project[a](j) Ú Project[a](y) • Project[a](jÙy) º Project[a](j) Ù Project[a](y) • Project[a](Øj) º ØProject[a](j) Ù Project[a](TRUE) • Compute update for literals in the formula separately, and combine the results • Known Success/Failure • 1:1 Actions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Project[a](jÙy) º Project[a](j) Ù Project[a](y) Actions that map states 1:1 • Reason for distribution over Ù : Project[a](jÙy) º Project[a](j) Ù Project[a](y) 1:1 Non-1:1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Algorithm: Factored Learning • Given: action a, observation o, transition-belief formula φt • Precompute update for every literal • Decompose φt recursively, update every literal separately, and combine the results • Conjoin the result of 2. with o, producing φt+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Fast Update of Action Model • Factored Learning algorithm takes time O(|φt|) to update formula • Algorithm is exact when • We know that actions are 1:1 mappings between states • Actions’ effects are always the same • Otherwise, approximate result: includes exact action model, but also others • Resulting representation is flat (CNF) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Compact Flat Representation: How? • Keep some property of invariant, e.g., • K-CNF (CNF with k literals per clause) • #clauses bounded • Factored Learning: compact repn. if • We know if action succeeded, or • Action failure leaves affected propositions in a specified nondeterministic state, or • Approximate: We discard large clauses (allows more states) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Compact Representation in CNF • Action affects and depends on ≤k features |φt+1| ≤|φt|·nk(k+1) • Actions always have same effect |φt+1| ≤ O(t·n) • If also every feature observed every ≤k steps |φt+1| ≤ O(nk+1) • If (instead) the number of actions ≤k |φt+1| ≤ O(n·2klogk) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Experiments: Factored Learning Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Summary • Learning of effects and preconditions of actions in partially observable domains • Showed in this talk: • Exact DAG update for any action • Exact CNF update, if actions 1:1 or w/o conditional effects • Can update model efficiently without increase in #variables in belief state • Compact representation • Adventure games, virtual worlds, robots Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Innovation in this Research • First scalable learning algorithm for partially observable dynamic domains • Algorithm (DAG) • Always exact and optimal • Takes constant update time • Algorithm (Factored) • Exact for actions that always have same effect • Takes polynomial update time • Can solve problems with n>1000 domain features (>21000 states) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Current Approaches and Work • Reinforcement Learning & HMMs • [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00] • Maintain probability distribution over current state • Problem: Exact solution intractable for domains of high (>100) dimensionality • Problem: Approximate solutions have unbounded errors, or make strong mixing assumptions • Learning AI-Planning operators • [Wang ’95], [Benson ’95], [Pasula etal. ’04],… • Problem: Assume fully observable domain Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Open Problems • Efficient Inference with learned formula • Compact, efficient stochastic learning • Average case of formula size? • Dynamic observation models, filtering in expanding worlds • Software: http://www.cs.uiuc.edu/~eyal Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Acknowledgements • Dafna Shahaf • Megan Nance • Brian Hlubocky • Allen Chang • … and the rest of my incredible group of students Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
THE END Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Talk Outline • Actions in partially observed domains • Representation and update of models • Efficient algorithms • Related Work & Conclusions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae • Actions = propositional symbols assert effect rules • “sw-up causes on ^ up if E” • “go-W keeps up” (= “go-W causes up if up” …) • Prop symbol: go-W≈up, sw-uponE, sw-upupE Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Example: Light Switch • Initial belief state (time 0) = set of pairs: { <E,~on,~up>, <E,on,~up>}all transition rels. Space = O(2^(2^n)) • New encoding: E ~up Space = 2 • Question: how to update new representation? Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Updating Action Model • Transition belief state represented by j • Action-Definition(a)t,t+1 Ù((at Ù (af v (affÙ ft)) ft+1) fÙ (at Ù ft+1 (af v (affÙ ft))) (effect axioms + explanation closure) • Update: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Example Update: Light Switch • Transition belief state: jt = Et ~upt • Project[sw-on](jt) = (Et+1 sw-onEE sw-onE ) (~upt+1 sw-on~up~up sw-on~up) … • Update: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Updating Action Model • Transition belief state represented by j • jt+1 = Update[o](Project[a](jt)) • Actions: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 • Observations: Update[o](j) = jÙo Theorem: formula filtering equivalent to <transition,state>-set semantics Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Commonsense extraction Decision Making Module Knowledge Base Interface Module World Model Learning Module Filtering Module Larger Picture:An Exploration Agent Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Example: Light Switch • Initial belief state (time 0) = set of pairs: { <E,~on,~up>, <E,on,~up>}all transition rels. • Apply action a = go-W . • Resulting belief state (after action) • { <E,~on,~up> } x { transitions map to same state } • { <E,on,~up> } x { transitions map to same state } • { <~E,~on,~up> } x { transitions set position to ~E } • …. Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Example: Light Switch • Resulting belief state (after action) • { <E,~on,~up> } x { transitions map to same state } • { <E,on,~up> } x { transitions map to same state } • { <~E,~on,~up> } x { transitions set position to ~E } • …. • Observe: ~E, ~on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Experiments w/DAG-Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Some Learned Rules • Pickup(b1) causes Holding(b1) • Stack(b3,b5) causes On(b3,b5) • Pickup() does not cause Arm-Empty • Move(room1,room4) causes At(book5,room4) if In-Briefcase(book5) • Move(room1,room4) does not cause At(book5,room4) if ¬In-Briefcase(book5) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Approximate Learning • Always result of Factored-Learning ( φt ) includes exact action model • Same compactness results apply • Approximation decreases size: Discard clauses >k (allows more action models), |φt| = O(n^k) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
More in the Paper • Algorithm that uses deduction for exact updating the model representation always • Arbitrary preconditions and conditional effects • Formal justification of algorithms and complexity results Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Experiments Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
DAG-SLAF: The Algorithm Input: a formula φ , an action-observation sequence <ai,oi> , i=1..t Initialize: for each fluent f, explf := initfkb:= φ , where each f is replaced by initf <example here?> Process Sequence: for i=1..t do Update-Belief(ai,oi) return kb Λ base Λ (f ↔ explf ) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006
Current Game + Translation • LambdaMOO • MUD code base • Uses database to store game world, • Emphasis on player-world interaction • Powerful in-game programming language • Game sends agents logical description of world Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006