Connecting Learning and Logic

Connecting Learning and Logic Eyal Amir U. of Illinois, Urbana-Champaign Joint work with: Dafna Shahaf, Allen Chang Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Problem: Learn Actions’ Effects • Given: a sequence of observations over time • Example: Action a was executed • Example: State feature f has value T • Want: an estimate of actions’ effect model • Example: a is executable if the state satisfies some property • Example: under condition _, a has effect _ Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Example: Light Switch Time Action Observe (after action) Posn. Bulb Switch 0 E ~up 1 go-W ~E ~on 2 sw-up ~E ~on FAIL 3 go-E E ~up 4 sw-up E up 5 go-W ~E on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Example: Light Switch State 1 State 2 west east west east ~up ^ ~on ^ E  up ^ on ^ E • Flipping the switch changes world state • We do not observe the state fully ~up up ~on on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Motivation: Exploration Agents • Exploring partially observable domains • Interfaces to new software • Game-playing/companion agents • Robots exploring buildings, cities, planets • Agents acting in the WWW • Difficulties: • No knowledge of actions’ effects apriori • Many features • Partially observable domain Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Rest of This Talk • Actions in partially observed domains • Efficient learning algorithms • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Knowledge state k1 k2 k3 k4 Action a1 a2 a3 a4 World state s1 s2 s3 s4 Learning Transition Models Transition Knowledge • Learning: Update knowledge of the transition relation and state of the world Transition Relation 3 1 3 2 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

1 3 T2+ T1+ 2 T2+ 1 3 T2+ T3+ 2 1 T3+ T3+ Action Model:<State,Transition> Set 1 T1+ 2 T1+ T2+ 2 1 T3+ 2 T3+ Problem: n world features  2^(2^n) transitions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Rest of This Talk • Actions in partially observed domains • Efficient algorithms • Updating a Directed Acyclic Graph (DAG) • Factored update (flat formula repn.) • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae • Actions = propositional symbols assert effect rules • “sw-up causes on ^ up if E” • “go-W keeps up” (= “go-W causes up if up” …) • Prop symbol: go-W≈up, sw-uponE, sw-upupE Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Updating the Status of “Locked” Time 0 tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Updating the Status of “Locked” Time t expl(t) tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Updating the Status of “Locked” Time t+1 expl(t+1)    ........  ¬ ........ ¬  expl(t) “locked” holds because PressB did not change it “locked” holds because PressB caused it tr1 tr2 expl(0) initlocked PressB causes¬locked if locked PressB causeslocked if ¬locked Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Algorithm: Update of a DAG • Given: action a, observation o, transition-belief formula φt • for each fluent f, • kb:= kb Λ logic formula “a is executable” • expl'f := logical formula for the possible explanations for f’s value after action a • replace every fluent g in expl’f with a pointer to explg • update explf := expl'f • φt+1 is result of 2 together with o Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Fast Update: DAG Action Model • DAG-update algorithm takes constant time (using hash table) to update formula • Algorithm is exact • Result DAG has size O(Tnk+|φ0|) • T steps, n features, k features in action preconditions • Still only n features/variables • Use φt with a DAG-DPLL SAT-solver Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Experiments: DAG Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Experiments: DAG Queries Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Rest of This Talk • Actions in partially observed domains • Efficient algorithms • Updating a Directed Acyclic Graph (DAG) • Factored update (flat formula repn.) • Related Work & Conclusions • [Theory behind Algorithms] Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Distribution for Some Actions • Project[a](jÚy) º Project[a](j) Ú Project[a](y) • Project[a](jÙy) º Project[a](j) Ù Project[a](y) • Project[a](Øj) º ØProject[a](j) Ù Project[a](TRUE) • Compute update for literals in the formula separately, and combine the results • Known Success/Failure • 1:1 Actions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Project[a](jÙy) º Project[a](j) Ù Project[a](y) Actions that map states 1:1 • Reason for distribution over Ù : Project[a](jÙy) º Project[a](j) Ù Project[a](y) 1:1 Non-1:1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Algorithm: Factored Learning • Given: action a, observation o, transition-belief formula φt • Precompute update for every literal • Decompose φt recursively, update every literal separately, and combine the results • Conjoin the result of 2. with o, producing φt+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Fast Update of Action Model • Factored Learning algorithm takes time O(|φt|) to update formula • Algorithm is exact when • We know that actions are 1:1 mappings between states • Actions’ effects are always the same • Otherwise, approximate result: includes exact action model, but also others • Resulting representation is flat (CNF) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Compact Flat Representation: How? • Keep some property of invariant, e.g., • K-CNF (CNF with k literals per clause) • #clauses bounded • Factored Learning: compact repn. if • We know if action succeeded, or • Action failure leaves affected propositions in a specified nondeterministic state, or • Approximate: We discard large clauses (allows more states) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Compact Representation in CNF • Action affects and depends on ≤k features  |φt+1| ≤|φt|·nk(k+1) • Actions always have same effect  |φt+1| ≤ O(t·n) • If also every feature observed every ≤k steps  |φt+1| ≤ O(nk+1) • If (instead) the number of actions ≤k  |φt+1| ≤ O(n·2klogk) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Experiments: Factored Learning Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Summary • Learning of effects and preconditions of actions in partially observable domains • Showed in this talk: • Exact DAG update for any action • Exact CNF update, if actions 1:1 or w/o conditional effects • Can update model efficiently without increase in #variables in belief state • Compact representation • Adventure games, virtual worlds, robots Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Innovation in this Research • First scalable learning algorithm for partially observable dynamic domains • Algorithm (DAG) • Always exact and optimal • Takes constant update time • Algorithm (Factored) • Exact for actions that always have same effect • Takes polynomial update time • Can solve problems with n>1000 domain features (>21000 states) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Current Approaches and Work • Reinforcement Learning & HMMs • [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00] • Maintain probability distribution over current state • Problem: Exact solution intractable for domains of high (>100) dimensionality • Problem: Approximate solutions have unbounded errors, or make strong mixing assumptions • Learning AI-Planning operators • [Wang ’95], [Benson ’95], [Pasula etal. ’04],… • Problem: Assume fully observable domain Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Open Problems • Efficient Inference with learned formula • Compact, efficient stochastic learning • Average case of formula size? • Dynamic observation models, filtering in expanding worlds • Software: http://www.cs.uiuc.edu/~eyal Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Acknowledgements • Dafna Shahaf • Megan Nance • Brian Hlubocky • Allen Chang • … and the rest of my incredible group of students Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

THE END Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Talk Outline • Actions in partially observed domains • Representation and update of models • Efficient algorithms • Related Work & Conclusions Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Compact Encoding (Sometimes) • Transition Belief State = a logical formula (transition relation and state) • Observation = logical state formulae • Actions = propositional symbols assert effect rules • “sw-up causes on ^ up if E” • “go-W keeps up” (= “go-W causes up if up” …) • Prop symbol: go-W≈up, sw-uponE, sw-upupE Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Example: Light Switch • Initial belief state (time 0) = set of pairs: { <E,~on,~up>, <E,on,~up>}all transition rels. Space = O(2^(2^n)) • New encoding: E  ~up Space = 2 • Question: how to update new representation? Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Updating Action Model • Transition belief state represented by j • Action-Definition(a)t,t+1 Ù((at Ù (af v (affÙ ft))  ft+1) fÙ (at Ù ft+1 (af v (affÙ ft))) (effect axioms + explanation closure) • Update: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Example Update: Light Switch • Transition belief state: jt = Et ~upt • Project[sw-on](jt) = (Et+1 sw-onEE sw-onE )  (~upt+1 sw-on~up~up sw-on~up)  … • Update: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Updating Action Model • Transition belief state represented by j • jt+1 = Update[o](Project[a](jt)) • Actions: Project[a](jt)= logical resultst+1 of jtÙ Action-Definition(a)t,t+1 • Observations: Update[o](j) = jÙo Theorem: formula filtering equivalent to <transition,state>-set semantics Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Commonsense extraction Decision Making Module Knowledge Base Interface Module World Model Learning Module Filtering Module Larger Picture:An Exploration Agent Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Example: Light Switch • Initial belief state (time 0) = set of pairs: { <E,~on,~up>, <E,on,~up>}all transition rels. • Apply action a = go-W . • Resulting belief state (after action) • { <E,~on,~up> } x { transitions map to same state } • { <E,on,~up> } x { transitions map to same state } • { <~E,~on,~up> } x { transitions set position to ~E } • …. Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Example: Light Switch • Resulting belief state (after action) • { <E,~on,~up> } x { transitions map to same state } • { <E,on,~up> } x { transitions map to same state } • { <~E,~on,~up> } x { transitions set position to ~E } • …. • Observe: ~E, ~on Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Experiments w/DAG-Update Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Some Learned Rules • Pickup(b1) causes Holding(b1) • Stack(b3,b5) causes On(b3,b5) • Pickup() does not cause Arm-Empty • Move(room1,room4) causes At(book5,room4) if In-Briefcase(book5) • Move(room1,room4) does not cause At(book5,room4) if ¬In-Briefcase(book5) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Approximate Learning • Always  result of Factored-Learning ( φt ) includes exact action model • Same compactness results apply • Approximation decreases size: Discard clauses >k (allows more action models),  |φt| = O(n^k) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

More in the Paper • Algorithm that uses deduction for exact updating the model representation always • Arbitrary preconditions and conditional effects • Formal justification of algorithms and complexity results Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Experiments Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

DAG-SLAF: The Algorithm Input: a formula φ , an action-observation sequence <ai,oi> , i=1..t Initialize: for each fluent f, explf := initfkb:= φ , where each f is replaced by initf <example here?> Process Sequence: for i=1..t do Update-Belief(ai,oi) return kb Λ base Λ (f ↔ explf ) Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Current Game + Translation • LambdaMOO • MUD code base • Uses database to store game world, • Emphasis on player-world interaction • Powerful in-game programming language • Game sends agents logical description of world Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006

Connecting Learning and Logic

Connecting Learning and Logic

Presentation Transcript

Learning, Logic, and Probability: A Unified View

Connecting e-learning and pedagogy

Connecting Assessment with Learning

Connecting Research and Practice Through teacher professional learning

Connecting Learning

Connecting Life and Learning:

What If Learning: Connecting Christian Faith and Teaching

Connecting Teaching and Learning Through Assessment

What If Learning: Connecting Christian Faith and Teaching

Connecting Assessment, Language, and Learning

What if Learning…Connecting Christian faith and teaching

Transforming teaching and learning by connecting

Laptops, Learning, and Connecting the Dots

Connecting Learning Objects with RSS,Trackback, and Weblogs

Connecting Assessment, Language, and Learning

Connecting Student Learning and Assessment to Program Review

Logic: Learning Objectives

Connecting: Learning Teaching and Assessment

Connecting the Dots for Learning

Connecting the Web with Teaching and Learning

Connecting Assessment, Language, and Learning

Connecting Summative Assessment to Improving Teaching and Learning