360 likes | 657 Views
Transfer Learning. Lisa Torrey University of Wisconsin – Madison CS 540. Transfer Learning in Humans. Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism
E N D
Transfer Learning Lisa Torrey University of Wisconsin – Madison CS 540
Transfer Learning in Humans • Education • Hierarchical curriculum • Learning tasks share common stimulus-response elements • Abstract problem-solving • Learning tasks share general underlying principles • Multilingualism • Knowing one language affects learning in another • Transfer can be both positive and negative
Transfer Learning in AI Given Learn Task T Task S
Goals of Transfer Learning higher asymptote higher slope performance higher start training
Inductive Learning Search Allowed Hypotheses All Hypotheses
Transfer in Inductive Learning Search Allowed Hypotheses All Hypotheses Thrun and Mitchell 1995: Transfer slopes for gradient descent
Transfer in Inductive Learning Bayesian methods Bayesian Learning Bayesian Transfer Prior distribution + Data = Posterior Distribution Raina et al.2006: Transfer a Gaussian prior
Transfer in Inductive Learning Hierarchical methods Pipe Surface Circle Line Curve Stracuzzi2006: Learn Boolean concepts that can depend on each other
Transfer in Inductive Learning Dealing with Missing Data or Labels Task T Task S Shi et al. 2008: Transfer via active learning
Reinforcement Learning Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1) Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment • δ(s2, a2) = s3 • r(s2, a2) = r3 • δ(s1, a1) = s2 • r(s1, a1) = r2
Transfer in Reinforcement Learning Starting-point methods Hierarchical methods Alteration methods New RL algorithms Imitation methods
Transfer in Reinforcement Learning Starting-point methods Initial Q-table transfer Source task no transfer target-task training Taylor et al. 2005: Value-function transfer
Transfer in Reinforcement Learning Hierarchical methods Soccer Pass Shoot Run Kick Mehta et al. 2008: Transfer a learned hierarchy
Transfer in Reinforcement Learning Alteration methods Task S Original states Original actions Original rewards New states New actions New rewards Walsh et al. 2006: Transfer aggregate states
Transfer in Reinforcement Learning New RL Algorithms Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1) Q(s1, a1) + Δ π(s2) = a2 a1 a2 s2 s3 s1 r2 r3 Environment • δ(s2, a2) = s3 • r(s2, a2) = r3 • δ(s1, a1) = s2 • r(s1, a1) = r2 Torrey et al. 2006: Transfer advice about skills
Transfer in Reinforcement Learning Imitation methods source policy used target Torrey et al. 2007: Demonstrate a strategy training
My Research Starting-point methods Hierarchical methods Hierarchical methods New RL algorithms Imitation methods Skill Transfer Macro Transfer
RoboCup Domain 3-on-2 KeepAway 3-on-2 BreakAway 2-on-1 BreakAway 3-on-2 MoveDownfield
Inductive Logic Programming IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) … IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)
Advice Taking Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Agent Agent Compute Q-functions … Environment Environment Batch 2 Batch 1 Find Q-functions that minimize: ModelSize + C × DataMisfit
Advice Taking Batch Reinforcement Learning with Advice (KBKR) Agent Agent Compute Q-functions … Environment Environment Advice Batch 1 Batch 2 + µ × AdviceMisfit Find Q-functions that minimize: ModelSize + C × DataMisfit
Skill Transfer Algorithm Source ILP IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) Mapping Advice Taking Target [Human advice]
Selected Results Skill transfer to 3-on-2 BreakAway from several tasks
Macro-Operators pass(Teammate) move(Direction) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(ahead) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalLeft) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(left) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalRight) shoot(goalRight) shoot(goalLeft)
Demonstration An imitation method source policy used target training
Macro Transfer Algorithm Source ILP Demonstration Target
Macro Transfer Algorithm Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)
Macro Transfer Algorithm Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP pass(Teammate) shoot(goalRight) IF [ … ] THEN loop(State, Teammate)) IF [ … ] THEN enter(State)
Selected Results Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
Summary • Machine learning is often designed in standalone tasks • Transfer is a natural learning ability that we would like to incorporate into machine learners • There are some successes, but challenges remain, like avoiding negative transfer and automating mapping