250 likes | 344 Views
A Simple and Effective Method for Incorporating Advice into Kernel Methods. Richard Maclin University of Minnesota - Duluth Jude Shavlik, Trevor Walker, Lisa Torrey University of Wisconsin - Madison. The Setting. Given Examples of classification/regression task
E N D
A Simple and Effective Method for Incorporating Advice into Kernel Methods Richard Maclin University of Minnesota - Duluth Jude Shavlik, Trevor Walker, Lisa Torrey University of Wisconsin - Madison
The Setting Given • Examples of classification/regression task • Advice from an expert about the task Do • Learn an accurate model Knowledge-Based Classification/Regression
Advice IF goal center is close and goalie isn’t covering it THEN Shoot! and angleGoalieGCenter ≥ 25 IFdistGoalCenter ≤ 15 THENQshoot(x) ≥ 0.9
Knowledge-Based Support Vector Methods[Fung et al., 2002, 2003 (KBSVM), Mangasarian et al., 2005 (KBKR)] min size of model+ C |s| + penalties for not following advice(hence advice can be refined) such that f(x) = y s + constraints that represent advice slack terms
Our Motivation • KBKR adds many terms to opt. problem • Want accurate but more efficient method • Scale to a large number of rules • KBKR alters advice in somewhat hard to understand ways (rotation and translation) • Focus on a simpler method
Our Contribution – ExtenKBKR • Method for incorporating advice that is more efficient than KBKR • Advice defined extensionallyrather than intensionally (as in KBKR)
Knowledge-Based SVM Also penalty for rotation, translation
Our Extensional KBSVM Note, point from one class pseudo labeled with the other class
Advice format Bx ≤ d f(x) ≥ Incorporating Advice in KBKR IF distGoalCenter ≤ 15 and angleGoalieGCenter ≥ 25 THEN Qshoot(x) ≥ 0.9
Choosing Examples “Under” Advice • Training data – adds second label • more weight if labeled same • less if labeled differently • Unlabeled data – semi-supervised method • Generated data – but can be complex to generate meaningful data
Size of Linear Program E – number of examples Mk – number of examples per advice item (expect Mk << E)
Artificial Data: Methodology • 10 input variables • Two functions f1 = 20x1x2x3x4 – 1.25 f2 = 5x5 – 5x2 + 3x6 – 2x4 – 0.5 • Selected C, 1, 2, with tuning set • Considered adding 0 or 5 pseudo points • Used Gaussian kernel
Artificial Data: Advice IF x1 ≥ .7 x2 ≥ .7 x3 ≥ .7 x4 ≥ .7 THEN f1(x) ≥ 4 IF x5 ≥ .7 x2 ≤ .3 x6 ≥ .7 x4 ≤ .3 THEN f2(x) ≥ 5 IF x5 ≥ .6 x6 ≥ .6 THEN PREFER f2(x) TO f1(x) BY .1 IF x5 ≤ .3 x6 ≤ .3 THEN PREFER f1(x) TO f2(x) BY .1 IF x2 ≥ .7 x4 ≥ .7 THEN PREFER f1(x) TO f2 (x) BY .1 IF x2 ≤ .3 x4 ≤ .3 THEN PREFER f2(x) TO f1(x) BY .1
RoboCup: Methodology • Test on 2-on-1 BreakAway • 13 tiled features • Average over 10 runs • Selected C, 1, 2, with tuning set • Use linear model (tiled features for non-linearity)
RoboCup Performance ExtenKBKR twice as fast as KBKR in CPU cycles
Related Work • Knowledge-Based Kernel Methods • Fung et al., NIPS 2002, COLT 2003 • Mangasarian et al., JMLR 2005 • Maclin et al., AAAI 2005 • Le et al., ICML 2006 • Mangasarian and Wild, IEEE Trans Neural Nets 2006 • Other Methods Using Prior Knowledge • Schoelkopf et al., NIPS 1998 • Epshteyn & DeJong, ECML 2005 • Sun & DeJong, ICML 2005 • Semi-supervised SVMs • Wu & Srihari, KDD 2004 • Franz et al., DAGM 2004
Future Work • Label “near” examples to allow advice to expand • Analyze predictions for pseudo-labeled examples to determine how advice refined • Test on semi-supervised learning tasks
Conclusions ExtenKBKR • Key idea: sample advice (extensional definition) and train using standard methods • Empirically as accurate as KBKR • Empirically more efficient than KBKR • Easily adapted to other forms of advice
Acknowledgements • US Naval Research Laboratory grant N00173-06-1-G002 (to RM) • DARPA grant HR0011-04-1-0007 (to JS)