LS-SVMlab & Large scale modeling

LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor

I. Overview II. Classification III. Regression IV. Unsupervised Learning V. Time-series VI. Conclusions and Outlooks Content

People • Contributors to LS-SVMlab: • Kristiaan Pelckmans • Johan Suykens • Tony Van Gestel • Jos De Brabanter • Lukas Lukas • Bart Hamers • Emmanuel Lambert • Supervisors: • Bart De Moor • Johan Suykens • Joos Vandewalle Acknowledgements Our research is supported by grants from several funding agencies and sources: Research Council K.U.Leuven: Concerted Research Action GOA-Mefisto 666 (Mathematical Engineering), IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: Fund for Scientific Research FWO Flanders (several PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0080.01 (collective intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power islands), research communities ICCoS, ANMMM), AWI (Bil. Int. Collaboration South Africa, Hungary and Poland), IWT (Soft4s (softsensors), STWW-Genprom (gene promotor prediction), GBOU McKnow (Knowledge management algorithms), Eureka-Impact (MPC-control), Eureka-FLiTE (flutter modeling), several PhD-grants); Belgian Federal Government: DWTC (IUAP IV-02 (1996-2001) and IUAP V-10-29 (2002-2006): Dynamical Systems and Control: Computation, Identification & Modelling), Program Sustainable Development PODO-II (CP-TR-18: Sustainibility effects of Traffic Management Systems); Direct contract research: Verhaert, Electrabel, Elia, Data4s, IPCOS. JS is a professor at K.U.Leuven Belgium and a postdoctoral researcher with FWO Flanders. BDM and JWDW are full professors at K.U.Leuven Belgium.

Goal of the Presentation Overview & Intuition Demonstration LS-SVMlab Pinpoint research challenges Preparation NIPS 2002 Research results and challenges Towards applications Overview LS-SVMlab I. Overview

I.2 Overview research “Learning, generalization, extrapolation, identification, smoothing, modeling” • Prediction (black box modeling) • Point of view: Statistical Learning, Machine Learning, Neural Networks, Optimization, SVM

I.2 Type, Target, Topic

I.3 Towards applications • System identification • Financial engineering • Biomedical signal processing • Datamining • Bio-informatics • Textmining • Adaptive signal processing

I.4 LS-SVMlab

I.4 LS-SVMlab (2) • Starting points: • Modularity • Object Oriented & Functional Interface • Basic bricks for advanced research • Website and tutorial • Reproducibility (preprocessing)

II. Classification “Learn the decision function associated with a set of labeled data points to predict the values of unseen data” • Least Squares – Support Vector Machines • Bayesian Framework • Different norms • Coding schemes

II.1 Least Squares – Support vector Machines (LS-SVM (,)) • Least Squares cost-function + regularization & equality constraints • Non-linearity by Mercer kernels • Primal-Dual Interpretation (Lagrange multipliers) Primal parametric Model: Dual non-parametric Model:

II.1 LS-SVM (,) “Learning representations from relations”

II.2 Bayesian Inference • Bayes rule (MAP): • Closed form formulas Approximations: - Hessian in optimum - Gaussian distribution • Three levels of posteriors:

II.3 SVM formulations & norms • 1 norm + inequality constraints: SVM extensions to any convex cost-function • 2 norm + equality constraints: LS-SVM weighted versions

… -1 -1 -1 1 … … 1 -1 -1 -1 … … 1 2 4 6 2 1 3 … … 1 2 4 6 2 1 3 … … … 1 -1 1 1 … Encoding Decoding II.4 Coding schemes Multi-class Classification task  (multiple) binary classifiers Labels:

III. Regression “Learn the underlying function from a set of data points and its corresponding noisy targets in order to predict the values of unseen data” • LS-SVM(,) • Cross-validation (CV) • Bayesian Inference • Robustness

III.1 LS-SVM(,) • Least Squares cost-function + Regularization & Equality constraints • Mercer kernels • Lagrange multipliers: Primal Parametric  Dual Non-parametric

III.1 LS-SVM(,) (2) • Regularization parameter: • Do not fit noise (overfitting)! • trade-off noise and information

1 2 3 …. t-1 t … n 1 2 3 …. t-2 t-1 t t+1 t+2 … n 1 2 3…t-l-1 t-l…t+l t+1+l … n III.2 Cross-validation (CV) “How to estimate generalization power of model?” • Division training set – test set • Repeated division: Leave-one-out CV (fast implementation) • L-fold cross-validation • Generalized Cross-validation (GCV): • Complexity criteria: AIC, BIC, …

III.2 Cross-validation Procedure (CVP) “How to optimize model for optimal generalization performance” • Trade-off fitting – model complexity • Kernel parameters • Optimization routine?

III.1 LS-SVM(,) (3) • Kernel type and parameter “Zoölogy as elephantism and non-elephantism” • Model Comparison • By cross-validation or Bayesian Inference

III.3 Applications “ok, but does it work?” • Soft4s • Together with O. Barrero, L. Hoegaerts, IPCOS (ISMC), BASF, B. De Moor • Soft-sensor • ELIA • Together with O. Barrero, I.Goethals, L. Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B. De Moor • Prediction short and long term electricity consumption

III.2 Bayesian Inference • Bayes rule (MAP): • Closed form formulas • Three levels of posteriors:

III.4 Robustness “How to build good models in the case of non-Gaussian noise or outliers” • Influence function • Breakdown point • How: • De-preciating influence of large residuals • Mean - Trimmed mean – Median • Robust CV, GCV, AIC,…

IV. Unsupervised Learning “Extract important features from the unlabeled data” • Kernel PCA and related methods • Nyström approximation • From Dual to primal • Fixed size LS-SVM

y z x IV.1 Kernel PCA Principal Component Analysis Kernel based PCA

IV.2 Kernel PCA (2) • Primal Dual LS-SVM style formulations • For Kernel PCA, CCA, PLS

IV.2 Nyström approximation • Sampling of integral equation • Approximating Feature map for Mercer kernel

IV.3 Fixed Size LS-SVM ?

V. Time-series “Learn to predict future values given a sequence of past values” • NARX • Recurrent vs. feedforward

f V.1 NARX • Reducible to static regression • CV and Complexity criteria • Predicting in recurrent mode • Fixed size LS-SVM (sparse representation)

V.1 NARX (2) Santa Fe Time-series competition

V.2 Recurrent models? “How to learn recurrent dynamical models?” • Training cost = Prediction cost? • Non-parametric model class? • Convex or non-convex? • Hyper-parameters?

VI.0 References • J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor & J. Vandewalle (2002), Least Squares Support Vector Machines, World Scientific. • V. Vapnik (1995), The Nature of Statistical Learning Theory, Springer-Verlag. • B. Schölkopf &A. Smola (2002), Learning with Kernels, MIT Press. • T. Poggio & F. Girosi (1990), ``Networks for approximation and learning'', Proc. of the IEEE, , 78, 1481-1497. • N. Cristianini &J. Shawe-Taylor (2000),An Introduction to Support Vector Machines, Cambridge University Press.

VI. Conclusions “Non-linear Non-parametric learning as a generalized methodology” • Non-parametric Learning • Intuition & Formulations • Hyper-parameters • LS-SVMlab Questions?

LS-SVMlab & Large scale modeling