Networks for Learning: Regression and Classification

9.520 Networks for Learning: Regression and Classification tomaso poggio + alessandro verri

Multidisciplinary Approach to the Learning problem LEARNING THEORY AND ALGORITHMS • Classification + Regression: • Information extraction (text classification,…) • Computer vision (object recognition) • Computer graphics (TTVS) • Sound classification • Bioinformatics (DNA arrays) • Artificial Financial Markets (society of learning agents) ENGINEERING APPLICATIONS, PLAUSIBILITY PROOFS NEUROSCIENCE: MODELS AND EXPERIMENTS

Learning: Brains and Machines CBCL: about 20 people...

Overview of overview o Supervised learning: the problem and how to frame it within classical math o Examples of in-house applications o Learning and the brain

EXAMPLES INPUT1 OUTPUT1 INPUT2 OUTPUT2 ........... INPUTn OUTPUTn Learning from Examples f OUTPUT INPUT

Learning from Examples: formal setting Given a set of l examples(past data) Question: find function f such that is agood predictorof y for a future input x

Neural Networks By the way… the term Neural Network has been overused. Look at the following definition of Neural Networks (from the Wall Street Journal, Nov 2, 1998) “...the investment method is based on a set of neural networks, that is a complex series of mathematical equations and algorithms incorporated into a software program.”

Classical equivalent view: supervised learning as problem of multivariate function approximation =data from f = function f y = approximation of f x Generalization: estimating value of function where there are no data Regression: function is real valued Classification: function is binary

Statistical Learning theory: key questions and foundations Key questions: when is generalization possible? Bounds on generalization error? Problem of multivariate function approximation is ill-posed for a finite sample of “examples” Generalization requires prior assumptions/regularization/capacity control to restrict the space of functions/hypotheses/architectures such as smoothness (tel. dir. example) Thus trade-off between capacity control and sample size: theory characterizes trade-off

A unified theory • Regularization networks -- such as Gaussian Radial Basis Functions -- and also Support Vector Machines for regression (SVMR) and for classification (SVMC) can be justified in terms of a new, fundamental statistical theory of learning. • The theory -- developed mainly by Vapnik -- deals with the problem of learning from finite and small sample sizes. • Its focus is on the problem of capacity control, i.e. the learning and statistical version of regularization. • Its key motivation is to do classification and regression without density estimation

Statistical Learning theory: specific algorithms To restrict space of hypothesis there is a “classical” solution: Regularization Networks The function f that minimizes has the form whereK is the basis function associated with the RKHS norm (the regularizer term) Wahba 1990; Poggio and Girosi, 1989; Smale and Cucker, 2001

Non-classical framework: more general loss function We will see how a “small” extension of classical framework can be made to include RN, SVMC,SVMR… Girosi, Caprile, Poggio, 1990

X Y X X 1 K K K C C 1 C N n + f Equivalence to networks The classical technique (+ SVM) admit the same solution… The three techniques admit the same solution… …and can all be “written” as the same type of network..

Unified framework: RN, SVMR and SVMC “New” result: equation includes Regularization Networks, eg Radial Basis Functions, and Support Vector Machines (classification and regression) and some multilayer perceptrons. Statistical learning theory applies. Review by Evgeniou, Pontil and Poggio Advances in Computational Mathematics, 2000

Theory summary • We will introduce • classical regularization • Vapnik’s theory with the notion of VC dimension • an extension of VC dimension (in order to apply the theory to both RN and SVM and to regression as well as classification) • SVM and properties such as: • SVMC is a special case of SVMR • Relations between Regularization, SVM and BPD • Bayes interpretation of Regularization, SVM, BPD • Beyond SVM and boosting

OUTPUT INPUT Learning from Examples: Applications Object identification Object categorization Graphics Finance Bioinformatics …

Face identification A view-based system: 15 views Performance: 98% on 68 person database Beymer, 1995

OUTPUT INPUT Learning from Examples: Applications Object identification Object categorization Face expression Graphics Finance Bioinformatics …

Application: A trainable Object Detection System Scanning in x,y and scale Preprocessing with overcomplete dictionary of Haar wavelets TRAINING Data Base QP Solver SVM Classifier

Learning Object Detection: Finding Frontal Faces ... Training Database 1000+ Real, 3000+ VIRTUAL 50,0000+ Non-Face Pattern Sung, Poggio 1995

NON-FACES FACES Support Vectors are Sparse! Some of the support vectors found by SVM training with thousands of face and non-face examples …that is the coefficients in are sparse!

Recent work on face detection • Detection of faces in images • Robustness against slight rotations in depth and imageplane • Full face vs. component-based classifier Heisele, Pontil, Poggio, 2000

New Classifier:Combining Component Detectors Heisele, Pontil, Poggio, 2000

The best existing system for face detection? Heisele, Poggio et al., 2000

Trainable System for Object Detection: Pedestrian detection - Training Papageorgiou and Poggio, 1998

System installed in experimental Mercedes A fast version, integrated with a real-time obstacle detection system MPEG Constantine Papageorgiou

Image Analysis Þ Bear (0° view) IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE ESTIMATION ÞBear (45° view)

The problem The main goal is to estimate basic facial parameters, e.g. degree of mouth openness, through learning

The Three Stages Localization of Facial Features Analysis of Facial parts Face Detection

Image Synthesis Q = 0° view Þ UNCONVENTIONAL GRAPHICS Q = 45° view Þ

nFX Interactive nFX Toono (S. Librande, R. Belfer)

Supermodels MPEG (Steve Lines)

A trainable system for TTVS • Input: Text • Output: photorealistic talking face uttering text Tony Ezzat

TTVS: video Tony Ezzat, T. Poggio

Reconstructed 3D Face Models from 1 image Blanz and Vetter, MPI SigGraph ‘99

Public Limit/Market Orders Bid/Ask Prices Bid/Ask Sizes Buy/Sell Inventory control All available market Information EMM User control/calibration Learning Feedback Loop Artificial Agents – learning algorithms – buy and sell stocks Example of a subproject: The Electronic Market Maker Nicholas Chang

Bioinformatics application: predicting type of cancer from DNA chips signals Prediction Statistical Learning Algorithm Prediction Examples Newsample Learning from examples paradigm

Bioinformatics application: predicting type of cancer from DNA chips New feature selection SVM: Only 38 training examples, 7100 features AML vs ALL: 40 genes 34/34 correct, 0 rejects. 5 genes 31/31 correct, 3 rejects of which 1 is an error.

VIEW ANGLE S Model of view-invariant identification A graphical rewriting of a regularization Network (GRBF), a learning technique Poggio, Edelman, Nature, 1990.

dorsal stream: “where” ventral stream: “what” The Visual System Simplified:Two Visual Pathways, “What” and “Where” Desimone & Ungerleider, 1989

Recording Sites in Anterior IT Logothetis, Pauls, and Poggio, 1995; Logothetis, Pauls, 1995

Model’s predictions: View-tuned Neurons VIEW-TUNED UNITS VIEW ANGLE

Target Views -168 -120 -108 -96 -84 -72 -60 -48 -36 -24 -12 0 o o o o o o o o o o o o -168 -120 -84 -72 -48 -36 -24 -12 0 -108 -96 -60 Distractors o o 12 o 24 o 36 o 48 o 60 o 72 o 84 o 96 o 108 o 120 o 132 168 36 48 60 108 120 12 72 24 84 96 60 spikes/sec 800 msec A “View-Tuned” IT Cell Logothetis et al., 1995

Networks for Learning: Regression and Classification