60 likes | 76 Views
Jerry Friedman clarifies the relationship between methods in Phystat 2005 and explores new variations. He introduces ensemble learning and interprets the importance of interpretability in rule induction.
E N D
Deconstructing Jerry Phystat 2005
Jerry The Demystifier Friedman • Embed methods in a general framework • Clarify the relationship between methods • And better discern what is truly new • Jerry The Generalizer Friedman • But having understand, for example, that Boosting and Bagging are “merely” interesting variations on a theme, infinitely many other variations immediately spring to (Jerry’s) mind! Phystat 2005
Function class Ensemble Learning Pick base learnersfm(x) from the function class, usually trees, and build, incrementally, an ensemble {fm(x)}…a forest. Each base learner is selected by minimizing a loss function Phystat 2005
Jerry The Visigoth Friedman – Champion of (fruitful) environmental vandalism • Reduce forest to a pile of twigs, terminal nodes, and note the fact that each twig induces a rule, such as:r(ETjet1, ETjet2,…) = I(ETjet1 > 31.42) ×I(ETjet2 > 51.2) × ...where I(x) = 1 if x is true, 0 otherwise Phystat 2005
Interpretability? A rule such as r(ETjet1, ETjet1,…)=I(ETjet1 > 31.42) ×I(Ejet2 > 51.2) × ... is easy to understand and this is surely a splendid thing. But… is not so brightly lit! But, providedsignal and background models are well understood interpretability is unimportant. Phystat 2005
It would be interesting to try some other function classes. For example, f (x;p) = tanh (p0 + p·x), with, for example, the loss function where and build up, term by term, the function Phystat 2005