150 likes | 165 Views
Transfer Learning with Inter-Task Mappings. Matthew E. Taylor Joint work with Peter Stone Department of Computer Sciences The University of Texas at Austin. Transfer Motivation. Learning tabula rasa can be unnecessarily slow Humans can use information from previous tasks
E N D
Transfer Learning with Inter-Task Mappings Matthew E. Taylor Joint work with Peter Stone Department of Computer Sciences The University of Texas at Austin
Transfer Motivation • Learning tabula rasa can be unnecessarily slow • Humans can use information from previous tasks • Soccer with different numbers of players • Agents: leverage learned knowledge in novel/modified tasks • Learn faster • Larger and more complex problems become tractable • Different numbers of state variables and actions in tasks
Common TL Metrics Also: total reward accumulated
Transfer Goals • Autonomous transfer • AI Goal • Explore the world, learning • Transfer autonomously • Utilize past knowledge • Learn difficult tasks faster • Engineering Goal • Learn a set of simple tasks • Eventually learn target task • Total time reduction
ρ Transfer via Inter-Task Mappings Source Task πnot defined for S’ and A’ ρ is a transfer functional task-dependant: relies on inter-task mappings π(S) → A π’(S’) → A’ Target Task
Inter-Task Mappings • χA: atarget → asource Given target task action, return similar source task action • χX: starget → ssource Similar, but for state variables: for all x in each target task state: s = ⟨x1, x2, … xn⟩ • ρ automatically formed from χAand χX to enable transfer of: • π(s) • Q(s, a) • Rules • Model • etc.
Transfer Functional: ρCMAC New states and actions in target task → new tiles Source Target • Counterintuitive: • Q-Values are very low-level • Very task-specific
Sample Results • Can significantly reduce target task time and total time • Able to learn inter-task mappings with little data Keepaway Transfer: 3 vs. 2 to 4 vs. 3 Source Task Time Target Task Time Source Task Episodes
Empirical Domains • Robot Soccer Keepaway • Server Job Scheduling • Mountain Car • Killer Application? • Epilepsy? • Robotics?
Open Questions: 1/3 • Optimize for Total Time? Source Task Time Target Task Time Source Task Episodes
Open Questions: 2/3 • Guarantee transfer efficacy? • Avoid Negative Transfer (“Giveaway”)? • Similarity measure? • Jumpstart in Target • MDP similarity [Ferns, others] • Analysis of learned source task knowledge
Open Questions: 3/3 • Learn an inter-task mapping efficiently? • Sample complexity • Computational complexity • Select Source Task? • In library (sunk cost) • To learn first (total time metric)
MASTER OverviewModeling Approximate State Transitions by Exploiting Regression Record observed (ssource, asource, s’source) tuples in source task Record small number of (starget, atarget, s’target) tuples in target task Learn one-step transition model, T(S,A), for the target task: M(starget, atarget) →s’target for every possible action mapping χA for every possible state variable mapping χX Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(stransformed, atransformed) – s’ transformed)2 returnχA,χX with lowest error