Online and Batch Learning of Pseudo-Metrics

Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University

Motivating Example

Our Technique • Map instances into a space in which distances correspond to labels

Outline • Distance learning setting • Large margin for distances • An online learning algorithm • Online loss analysis • A dual version • Experiments: • Online - document filtering • Batch - handwritten digit recognition

Problem Setting • Training examples: • two instances • similarity label • Hypotheses class: Pseudo-metrics matrix symmetric positive semi-definite matrix

Large Margin for Pseudo-Metrics • Sample S is -separated w.r.t. a metric

Batch Formulation s.t. s.t.

we want that If: If: we want that Pseudo-metric OnlineLearning Algorithm (POLA) For • Gettwo instances • Calculate distance • Predict • Get true label and suffer hinge-loss • Update matrix and threshold

Start with An example defines a half-space is the projection of onto this half-space is the projection of onto the PSD cone Core Update: Two Projections PSD cone All zero loss matrices

Online Learning • Goal – minimize cumulative loss • Why Online? • Online processing tasks (e.g. Text Filtering) • Simple to implement • Memory and run-time efficient • Worst-case bounds on the performance • Online to batch conversions

“Complexity” of Loss suffered by Online Loss Bound • sequence of examples s.t. • any fixed matrix and threshold • Then, Loss bound does not depend on dimension

Incorporating Kernels • Matrix A can be written as , where • Therefore:

Online Experiments • Task: Document filtering according to topics • Dataset: Reuters-21578 • 10,000 documents • Documents labeled as Relevant and Irrelevant • A few relevant documents (1% - 10% of entire set) • Algorithms: • POLA • 1 Nearest Neighbor (1-NN) • Perceptron Algorithm • Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)

POLA for Document Filtering • Get a document • Calculate distance to relevant documents observed so far using current matrix • Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold • Get true label • Update matrix and threshold

POLA error POLA error POLA error PAUM error Perceptron error 1-NN error Document Filtering Results • Each blue point corresponds to one topic • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins

Batch Experiments • Task: Handwritten digits recognition • Dataset: MNIST dataset • 45 binary classification problems (all pairs) • 10,000 training examples • 10,000 test examples • Algorithms: Used k-NN with various metrics: • Pseudo-metric learned by POLA • Euclidean distance • Metric induced by Fisher Discriminant Analysis (FDA) • Metric learned by Relevant Component Analysis (RCA) (Bar-Hillel, Hertz, Shental, and Weinshall)

MNIST Results • Each blue point corresponds to one binary classification problem • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins RCA error FDA error Euclidean distance error RCA was applied after using PCA as a pre-processing step

Toy problem A color-coded matrix of Euclidean distances between pairs of images

Metric found by POLA

Mapping found by POLA • Our Pseudo-metrics:

Mapping found by POLA

Summary and Extensions • An online algorithm for learning pseudo-metrics • Formal properties, good experimental results Extensions: • Alternative regularization schemes to the Frobenius norm • “Learning to learn”: • Learning a metric from one set of classes and apply to another set of related classes

Hello  bye  = w ¢ x

Online and Batch Learning of Pseudo-Metrics

Online and Batch Learning of Pseudo-Metrics

Presentation Transcript

RiskMeter Online Batch Overview

MIMOSA online - batch searching

Science and Pseudo-science

Metrics That Matter Learning Analytics

Methods for Learning Metrics

Learning Metrics Task Force

Pseudo-History and Exploration

Models for online control of batch polymerization processes .

Learning Metrics Task Force

Batch Geocoding Online

Pseudo Classes and Pseudo Elements

Segments and Pseudo Operations

Innovations in Metrics of Learning LINGOS meeting

batch online learning

Ruminant and Pseudo- ruminant

10 pseudo-scientific myths about learning

RiskMeter Online Batch Overview

MIMOSA online - batch searching

Batch Geocoding Online

Evaluation Protocols and Metrics for Continual Learning