Can Machines Transfer Knowledge from Task to Task?

Can Machines Transfer Knowledge from Task to Task? Isabelle Guyon Clopinet, California http://clopinet.com/ul http://clopinet.com/ul

CREDITS Data donors: Handwriting recognition (AVICENNA) -- Reza Farrahi Moghaddam, Mathias Adankon, Kostyantyn Filonenko, Robert Wisnovsky, and Mohamed Chériet (Ecole de technologie supérieure de Montréal, Quebec) contributed the dataset of Arabic manuscripts. The toy example (ULE) is the MNIST handwritten digit database made available by Yann LeCun and Corinna Costes. Object recognition (RITA) -- Antonio Torralba, Rob Fergus, and William T. Freeman, collected and made available publicly the 80 million tiny image dataset. Vinod Nair and Geoffrey Hinton collected and made available publicly the CIFAR datasets. See the techreport Learning Multiple Layers of Features from Tiny Images, by Alex Krizhevsky, 2009, for details. Human action recognition (HARRY) -- Ivan Laptev and Barbara Caputo collected and made publicly available the KTH human action recognition datasets. Marcin Marszałek, Ivan Laptev and Cordelia Schmid collected and made publicly available the Hollywood 2 dataset of human actions and scenes. Text processing (TERRY) -- David Lewis formatted and made publicly available the RCV1-v2 Text Categorization Test Collection. Ecology (SYLVESTER) -- Jock A. Blackard, Denis J. Dean, and Charles W. Anderson of the US Forest Service, USA, collected and made available the (Forest cover type) dataset. Web platform: Server made available by Prof. Joachim Buhmann, ETH Zurich, Switzerland. Computer admin.: Thomas Fuchs, ETH Zurich. Webmaster: Olivier Guyon, MisterP.net, France. Platform: Causality Wokbench. Co-orgnizers: • David W. Aha, Naval Research Laboratory, USA. • Gideon Dror, Academic College of Tel-Aviv Yaffo, Israel. • Vincent Lemaire, Orange Research Labs, France. • Graham Taylor, NYU, New-York. USA. • Gavin Cawley, University of east Anglia, UK. • Danny Silver, Acadiau University, Canada. • Vassilis Athitsos, UT Arlington, Texas., USA. Protocol review and advising: • Olivier Chapelle, Yahoo!, California, USA. • Gerard Rinkus, Brandeis University, USA. • Urs Mueller, Net-Scale Technilogies, USA. • Yoshua Bengio, Universite de Montreal, Canada. • David Grangier, NEC Labs, USA. • Andrew Ng, Stanford Univ., Palo Alto, California, USA. • Yann LeCun, NYU. New-York, USA. • Richard Bowden, University of Surrey, UK. • Philippe Dreuw, Aachen University, Germany. • Ivan Laptev, INRIA, France. • Jitendra Malik, UC Berkeley, USA. • Greg Mori, Simon Fraser University, Canada. • Christian Vogler, ILSP, Athens, Greece http://clopinet.com/ul

What is the problem? http://clopinet.com/ul

Can learning about... http://clopinet.com/ul

help us learn about… http://clopinet.com/ul

Can learning about… publicly available data http://clopinet.com/ul

Philip and Thomas Omar, Thomas Philip Anna, Thomas and GM Anna Solene Martin Thomas Philip Philip Bernhard help us learn about… personal data http://clopinet.com/ul

Philip and Thomas Omar, Thomas Philip Anna, Thomas and GM Anna Solene Martin Thomas Philip Philip Bernhard Transfer learning Common data representation http://clopinet.com/ul

How? http://clopinet.com/ul

Vocabulary Target task labels Source task labels Target domain Source domain http://clopinet.com/ul

Target task labels Source task labels Target domain Source domain Vocabulary http://clopinet.com/ul

Target task labels Source task labels Target domain Source domain Vocabulary Labels available? Tasks the same? Domains the same? http://clopinet.com/ul

No labels in source domain Self-taught TL Inductive TL Labels available in source domain Multi-task TL Labels available in target domain Same source and target task Transductive TL Transfer Learning Labels avail. ONLY in source domain Semi-supervised TL Different source and target tasks Cross-task TL No labels in both source and target domains Unsupervised TL Taxonomy of transfer learning Adapted from: A survey on transfer learning, Pan-Yang, 2010. http://clopinet.com/ul

Taxonomy of transfer learning No labels in source domain Self-taught TL Inductive TL Labels available in source domain Multi-task TL Labels available in target domain Same source and target task Transductive TL Transfer Learning Labels avail. ONLY in source domain Semi-supervised TL Different source and target tasks Cross-task TL No labels in both source and target domains Adapted from: A survey on transfer learning, Pan-Yang, 2010. Unsupervised TL http://clopinet.com/ul

Unsupervised transfer learning http://clopinet.com/ul

What can you do with NO labels? • No learning at all: • Normalization of examples or features • Construction of features (e.g. products) • Generic data transformations (e.g. taking the log, Fourier transform, smoothing, etc.) • Unsupervised learning: • Manifold learning to reduce dimension (and/or orthogonalize features) • Sparse coding to expand dimension • Clustering to construct features • Generative models and latent variable models http://clopinet.com/ul

R Unsupervised transfer learning 1) P Source domain http://clopinet.com/ul

Unsupervised transfer learning 1) P http://clopinet.com/ul

Unsupervised transfer learning 1) P 2) Task labels P C John Target domain http://clopinet.com/ul

Unsupervised transfer learning P C Emily Target domain http://clopinet.com/ul

Manifold learning • PCA • ICA • Kernel PCA • Kohonen maps • Auto-encoders • MDS, Isomap, LLE, Laplacian Eigenmaps • Regularized principal manifolds http://clopinet.com/ul

Deep Learning Greedy layer-wise unsupervised pre-training of multi-layer neural networks and Bayesian networks, including: • Deep Belief Networks (stacks of Restricted Boltzmann machines) • Stacks of auto-encoders reconstructor preprocessor http://clopinet.com/ul

Clustering • K-means and variants w. cluster overlap (Gaussian mixtures, fuzzy C-means) • Hierarchical clustering • Graph partitioning • Spectral clustering http://clopinet.com/ul

Example: K-means • Start with random cluster centers. • Iterate: • Assign the examples to their closest center to form clusters. • Re-compute the centers by averaging the cluster members. • Create features, e.g. • fk= exp –g ||x-xk|| Clusters of ULE valid after 5 it. http://clopinet.com/ul

Results on ULE: do better! AUC AUC ALC=0.79 ALC=0.84 log2(num. tr. ex.) log2(num. tr. ex.) Raw data: 784 features K-means: 20 features Current best: AUC=1, ALC=0.96 http://clopinet.com/ul

Unsupervised learning(resources) • Unsupervised Learning. Z. Ghahramani. http://www.gatsby.ucl.ac.uk/~zoubin/course04/ul.pdf • Nonlinear dimensionality reduction. http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction • Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. Y. Bengio et al. http://books.nips.cc/papers/files/nips16/NIPS2003_AA23.pdf • Data Clustering: A Review. Jain et al. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.2720 • Why Does Unsupervised Pre-training Help DL? D. Erhan et al. http://jmlr.csail.mit.edu/papers/volume11/erhan10a/erhan10a.pdf • Efficient sparse coding algorithms. H. Lee et al. http://www.eecs.umich.edu/~honglak/nips06-sparsecoding.pdf http://clopinet.com/ul

Cross-task transfer learning http://clopinet.com/ul

How can you do it? • Data representation learning: • Deep neural networks • Deep belief networks (re-use the internal representation created by the hidden units and/or output units) • Similarity or kernel learning: • Siamese neural networks • Graph-theoretic methods http://clopinet.com/ul

Data representation learning 1) Source task labels P C Sea Source domain http://clopinet.com/ul

Data representation learning 1) P http://clopinet.com/ul

P C John Target domain Data representation learning 1) P 2) Target task labels http://clopinet.com/ul

Data representation learning P C Emily Target domain http://clopinet.com/ul

Kernel learning 1) P S Source task labels same or different Source domain P http://clopinet.com/ul

Kernel learning 1) P http://clopinet.com/ul

Kernel learning 1) P 2) Target task labels P C John Target domain http://clopinet.com/ul

Kernel learning P C Emily Target domain http://clopinet.com/ul

Cool results in cross-task transfer learning Genuine or not Source task Target tasks pos=Part-Of-Speech tagging chunk=Chunking ner=Named Entity Recognition srl=Semantic Role Labeling NLP (almost) from scratch. Collobert et al. 2011, submitted to JMLR http://clopinet.com/ul

Cross-task transfer (resources) • A Survey on Transfer Learning. Pan and Yang. http://www1.i2r.a-star.edu.sg/~jspan/publications/TLsurvey_0822.pdf • Distance metric learning: A comprehensive survey. Yang-Jin. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.4732 • Signature Verification using a "Siamese" Time Delay Neural Network. Bromley et al. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.4792 • Learning the kernel matrix with semi-definite programming, Lanckriet et al. http://jmlr.csail.mit.edu/papers/volume5/lanckriet04a/lanckriet04a.pdf • NLP (almost) from scratch. Collobert et al. 2011,http://leon.bottou.org/morefiles/nlp.pdf. http://clopinet.com/ul

Multi-task learning http://clopinet.com/ul

Multi-task learning Source task labels Source domain P Sea C John Target task labels Target domain http://clopinet.com/ul

Multi-task learning P C Target domain Emily http://clopinet.com/ul

Cool results in multi-task learning One-Shot Learning with a Hierarchical Nonparametric Bayesian Model, Salakhutdinov-Tenenbaum-Torralba, 2010 http://clopinet.com/ul

Self-taught learning http://clopinet.com/ul

Self-taught learning Source domain P C John Target task labels Target domain http://clopinet.com/ul

Self-taught learning P C Target domain Emily http://clopinet.com/ul

Cool results in self-taught learning Source taskTarget task Unsupervised Semi-supervised Multi-task Self-taught Self-taught learning. R. Raina et al. 2007 http://clopinet.com/ul

Inductive transfer learning (resources) • Multitask learning. R. Caruana. http://www.cs.cornell.edu/~caruana/mlj97.pdf • Learning deep architectures for AI. Y. Bengio. http://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf • Transfer Learning Techniques for Deep Neural Nets. S. M. Gutstein thesis. http://robust.cs.utep.edu/~gutstein/sg_home_files/thesis.pdf • One-Shot Learning with a Hierarchical Nonparametric Bayesian Model. R. Salakhutdinov et al. http://dspace.mit.edu/bitstream/handle/1721.1/60025/MIT-CSAIL-TR-2010-052.pdf?sequence=1 • Self-taught learning. R. Raina et al. http://www.stanford.edu/~rajatr/papers/icml07_SelfTaughtLearning.pdf http://clopinet.com/ul

Can Machines Transfer Knowledge from Task to Task?