A Simple and Effective Method for Incorporating Advice into Kernel Methods

A Simple and Effective Method for Incorporating Advice into Kernel Methods Richard Maclin University of Minnesota - Duluth Jude Shavlik, Trevor Walker, Lisa Torrey University of Wisconsin - Madison

The Setting Given • Examples of classification/regression task • Advice from an expert about the task Do • Learn an accurate model Knowledge-Based Classification/Regression

Advice IF goal center is close and goalie isn’t covering it THEN Shoot! and angleGoalieGCenter ≥ 25 IFdistGoalCenter ≤ 15 THENQshoot(x) ≥ 0.9

Knowledge-Based Classification

Knowledge-Based Support Vector Methods[Fung et al., 2002, 2003 (KBSVM), Mangasarian et al., 2005 (KBKR)] min size of model+ C |s| + penalties for not following advice(hence advice can be refined) such that f(x) = y  s + constraints that represent advice slack terms

Our Motivation • KBKR adds many terms to opt. problem • Want accurate but more efficient method • Scale to a large number of rules • KBKR alters advice in somewhat hard to understand ways (rotation and translation) • Focus on a simpler method

Our Contribution – ExtenKBKR • Method for incorporating advice that is more efficient than KBKR • Advice defined extensionallyrather than intensionally (as in KBKR)

Support Vector Machines

Knowledge-Based SVM Also penalty for rotation, translation

Our Extensional KBSVM Note, point from one class pseudo labeled with the other class

Advice format Bx ≤ d f(x) ≥  Incorporating Advice in KBKR IF distGoalCenter ≤ 15 and angleGoalieGCenter ≥ 25 THEN Qshoot(x) ≥ 0.9

Linear Program with Advice

Choosing Examples “Under” Advice • Training data – adds second label • more weight if labeled same • less if labeled differently • Unlabeled data – semi-supervised method • Generated data – but can be complex to generate meaningful data

Size of Linear Program E – number of examples Mk – number of examples per advice item (expect Mk << E)

Artificial Data: Methodology • 10 input variables • Two functions f1 = 20x1x2x3x4 – 1.25 f2 = 5x5 – 5x2 + 3x6 – 2x4 – 0.5 • Selected C, 1, 2,  with tuning set • Considered adding 0 or 5 pseudo points • Used Gaussian kernel

Artificial Data: Advice IF x1 ≥ .7  x2 ≥ .7  x3 ≥ .7  x4 ≥ .7 THEN f1(x) ≥ 4 IF x5 ≥ .7  x2 ≤ .3  x6 ≥ .7  x4 ≤ .3 THEN f2(x) ≥ 5 IF x5 ≥ .6  x6 ≥ .6 THEN PREFER f2(x) TO f1(x) BY .1 IF x5 ≤ .3  x6 ≤ .3 THEN PREFER f1(x) TO f2(x) BY .1 IF x2 ≥ .7  x4 ≥ .7 THEN PREFER f1(x) TO f2 (x) BY .1 IF x2 ≤ .3  x4 ≤ .3 THEN PREFER f2(x) TO f1(x) BY .1

Error on Artificial Data

Time Taken on Artificial Data

RoboCup: Methodology • Test on 2-on-1 BreakAway • 13 tiled features • Average over 10 runs • Selected C, 1, 2,  with tuning set • Use linear model (tiled features for non-linearity)

RoboCup Performance ExtenKBKR twice as fast as KBKR in CPU cycles

Related Work • Knowledge-Based Kernel Methods • Fung et al., NIPS 2002, COLT 2003 • Mangasarian et al., JMLR 2005 • Maclin et al., AAAI 2005 • Le et al., ICML 2006 • Mangasarian and Wild, IEEE Trans Neural Nets 2006 • Other Methods Using Prior Knowledge • Schoelkopf et al., NIPS 1998 • Epshteyn & DeJong, ECML 2005 • Sun & DeJong, ICML 2005 • Semi-supervised SVMs • Wu & Srihari, KDD 2004 • Franz et al., DAGM 2004

Future Work • Label “near” examples to allow advice to expand • Analyze predictions for pseudo-labeled examples to determine how advice refined • Test on semi-supervised learning tasks

Conclusions ExtenKBKR • Key idea: sample advice (extensional definition) and train using standard methods • Empirically as accurate as KBKR • Empirically more efficient than KBKR • Easily adapted to other forms of advice

Acknowledgements • US Naval Research Laboratory grant N00173-06-1-G002 (to RM) • DARPA grant HR0011-04-1-0007 (to JS)

Questions?

A Simple and Effective Method for Incorporating Advice into Kernel Methods