280 likes | 419 Views
Inductive Transfer With Context-sensitive Neural Networks. Danny Silver, Ryan Poirier, & Duane Currie Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca. Outline. Machine Lifelong Learning (ML3) and Inductive Transfer Multiple Task Learning (MTL)
E N D
Inductive Transfer With Context-sensitive Neural Networks Danny Silver, Ryan Poirier, & Duane Currie Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca
Outline • Machine Lifelong Learning (ML3) and Inductive Transfer • Multiple Task Learning (MTL) and its Limitations • csMTL – context sensitive MTL • Empirical Studies of csMTL • Conclusions and Future Work
Machine Lifelong Learning (ML3) • Considers methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning [Thrun97] • We investigate systems that must learn: • From impoverished training sets • For diverse domains of related/unrelated tasks • Where practice of the same task is possible • Applications: IA, User Modeling, Robotics, DM
Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection Knowledge-Based Inductive Learning: An ML3Framework Testing Examples Instance Space X (x, f(x)) Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x)
Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection fk(x) f1(x) f2(x) x1 xn Knowledge-Based Inductive Learning: An ML3Framework Testing Examples Instance Space X (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)
fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation x1 xn Multiple Task Learning (MTL) • Multiple hypotheses develop in parallel within one back-propagation network [Caruana, Baxter 93-95] • An inductive bias occurs through shared use of common internal representation • Knowledge or Inductive transfer to primary task f1 (x) depends on choice of secondary tasks
Lifelong Learning with MTL Band Domain Mean Percent Misclass. Logic Domain Coronary Artery Disease A B C D
fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation [Caruana, Baxter] x1 xn Limitations of MTL for ML3 • Problems with multiple outputs: • Training examples must have matching target values • Redundant representation • Frustrates practice of a task • Prevents a fluid development of domain knowledge • No way to naturally associate examples with tasks • Inductive transfer limited to sharing of hidden node weights • Inductive transfer relies on selecting related secondary tasks
One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) • Recently developed an alternative approach that is meant to overcome these limitations: • Uses a single output neural network structure • Context inputs associate an example with a task • All weights are shared - focus shifts from learning separate tasks to learning a domain of tasks • No measure of task relatedness is required
One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) Recently, have shown that csMTL has two important constraints: • Context and bias weights • Context and output weights VC(csMTL) < VC(MTL) k j
T3 T1 T0 T2 T4 T6 T5 Band of positive examples csMTL Empirical StudiesTask Domains • Band • 7 tasks, 2 primary inputs • Logic T0 = (x1 > 0.5 x2 > 0.5) (x3 > 0.5 x4 > 0.5) • 6 tasks, 10 primary inputs • fMRI • 2 tasks, 24 primary inputs
y’ c1 ck x1 xn Why is csMTL doing so well? • Consider two unrelated tasks: • From a task relatedness perspective - correlation or mutual information over all examples is 0 • From an example by example perspective - 50% of examples have matching target values • csMTL transfers knowledge at the example level • Greater sharing of representation
csMTL Results – SameTask • Learn primary task with transfer from 5 secondary tasks • 20 training examples per tasks, all examples drawn from same function f’(c,x) f1(x) f2(x) f5(x) c1 c5 x1 x10 x1 x10 MTL csMTL
csMTL Results – SameTask • Learn primary task with transfer from 5 secondary tasks • 20 training examples per tasks, all examples drawn from same function
One output for all tasks y’ c1 ck x1 xn Context Inputs Primary Inputs Measure of Task Relatedness? Early conjecture:Context to hidden node weight vectors can be used to measure task relatedness Not true:Two hypotheses for the same examples can develop that • have equivalent function • use different representation Transfer is functional in nature.
Conclusions • csMTL is a method of inductive transfer using multiple tasks: • Single task output, additional context inputs • Shifts focus to learning a continuous domain of tasks • Eliminates redundant task representation (multiple outputs) • Empirical studies: • csMTL performs inductive transfer at or above level of MTL • Without measure of relatedness • A machine life-long learning (ML3) system based on two csMTL networks is also proposed in paper
Future Work • Relationship between theory of Hints [Abu-Mostafa], secondary tasks (inductive bias, VCD) • Conditions under which csMTL ANNs succeed / fail • Exploring domains with real-valued context inputs • Will csMTL work with other ML methods? • Develop and test csMTL ML3 system
f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Task Context Standard Inputs A ML3 based on csMTL One output for all tasks Functional transfer (virtual examples) for slow consolidation f’(c,x) Long-term Consolidated Domain Knowledge Network c1 ck x1 xn
Thank You! • danny.silver@acadiau.ca • http://plato.acadiau.ca/courses/comp/dsilver/ • http://birdcage.acadiau.ca:8080/ml3/
Inductive Bias and Knowledge Transfer Human learners use Inductive Bias ASH ST FIR ST SEC OND THI RD ELM ST • Inductive bias depends upon: • Knowledge of task domain • Selection of most related • tasks PINE ST OAK ST
Requirements for a ML3 System:Req. for Long-term Retention … • Effective Retention • Resist introduction and accumulation of error • Retention of new task knowledge should improve related prior task knowledge (practice should improve performance) • Efficient Retention • Minimize redundant use of memory via consolidation • Meta-knowledge Collection • e.g. Example distribution over the input space • Ensures Effective and Efficient Indexing • Selection of related prior knowledge for inductive bias should be accurate and rapid
Requirements for a ML3 System:Req. for Short-term Learning … • Effective (transfer) Learning • New learning should benefit from related prior task knowledge • ML3 hypotheses should meet or exceed accuracy of those hypotheses developed without benefit of transfer • Efficient (transfer) Learning • Transfer should reduce training time • Increase in space complexity should be minimized • Transfer versus Training Examples • Must weigh relevance and accuracy of prior knowledge, against • Number and accuracy of available training examples
x = weather data f(x) = flow rate MTL – A Recent Example Stream flow rate prediction [Lisa Gaudette, 2006]
Benefits of csMTL ML3: • Long-term Consolidation … • Effective retention (all tasks in DK net improve) • Efficient retention (redundancy eliminated) • Meta-knowledge collection (context cues) • Short-term Learning … • Effective learning (inductive transfer) • Efficient learning (representation + function) • Transfer / training examples used appropriately