90 likes | 228 Views
NIPS 2002 workshops. Negative Results and Open Problems. Isabelle Guyon Clopinet. Why?. Negative results are seldom reported. A wealth of knowledge is lost. Negative results are sometimes more informative that positive results. Negative results sometimes point to important open problems.
E N D
NIPS 2002 workshops Negative Results and Open Problems Isabelle Guyon Clopinet
Why? • Negative results are seldom reported. A wealth of knowledge is lost. • Negative results are sometimes more informative that positive results. • Negative results sometimes point to important open problems.
Bain (1873) James (1890) Hebb (1949) Ramon y Cajal (1906) Rochester (1956) Rashevsky (1938) Hopfield (1982) Rosenblatt (1962) Kohonen/Anderson (1972) Backprop (1986) Widrow & Hoff (1959) Farley & Clarck (1954) Mc Cullogh & Pitts (1943) History • We better pay attention to negative results… neural networks have been “killed” several times by negative results. time SVMs (1992) Minski & Papert (1969)
Negative prejudices 1) Against multilayer networks • 1949 - Minsky&Papert: XOR problem. • 1983 - Kirkpatrick et al: simulated annealing. • 1985 - Hinton&Sejnowski: Boltzmann Machine. • 1986: Backprop.
Negative prejudices 2) Against polynomials • 1975 - T. Poggio: kernel trick for polynomials. • 1977 - J. Schürmann: feature selection. • 1984 -T. Kohonen: popularizes Poggio’s results. Lost popularity when back-prop arrived, 1986. Become an example of “overfitting model”. • 1992 - SVMs: Regain of interest.
Negative prejudices 3) Against kernel methods • 1964 - Aizerman et al: Potential functions. • 1967 - Cover-Hart: Nearest neighbors. • 1982 - Hopfield nets, distributed associative memories. Parody of the “Grand-Mother” cell methods. • 1992 - SVMs: Regain of interest.
Negative prejudices 4) Against biased estimators • 1922 - Fisher: promotion of the use of unbiased estimators in statistics. • 1971 - Vapnik-Chervonenkis theory: biased estimators are “good” to get better generalization. (this example is a courtesy of V. Vapnik)
Other prejudices • Against greedy search. • For multivariate feature selection. • For small VC dimension. • For sparse solutions. • For introducing domain knowledge. • Your negative result here.
Some of my negative results • SVM clustering does not work in high dimensions. • Feature selection with correlation methods often works better than multivariate methods. • Dumb linear, poly2 and Gaussian kernels are hard to beat. • Hard to say of MSE and SVM which is best to train kernel classifiers.