230 likes | 361 Views
Online and Batch Learning of Pseudo-Metrics. Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University. Motivating Example. Our Technique. Map instances into a space in which distances correspond to labels. Outline.
E N D
Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University
Our Technique • Map instances into a space in which distances correspond to labels
Outline • Distance learning setting • Large margin for distances • An online learning algorithm • Online loss analysis • A dual version • Experiments: • Online - document filtering • Batch - handwritten digit recognition
Problem Setting • Training examples: • two instances • similarity label • Hypotheses class: Pseudo-metrics matrix symmetric positive semi-definite matrix
Large Margin for Pseudo-Metrics • Sample S is -separated w.r.t. a metric
Batch Formulation s.t. s.t.
we want that If: If: we want that Pseudo-metric OnlineLearning Algorithm (POLA) For • Gettwo instances • Calculate distance • Predict • Get true label and suffer hinge-loss • Update matrix and threshold
Start with An example defines a half-space is the projection of onto this half-space is the projection of onto the PSD cone Core Update: Two Projections PSD cone All zero loss matrices
Online Learning • Goal – minimize cumulative loss • Why Online? • Online processing tasks (e.g. Text Filtering) • Simple to implement • Memory and run-time efficient • Worst-case bounds on the performance • Online to batch conversions
“Complexity” of Loss suffered by Online Loss Bound • sequence of examples s.t. • any fixed matrix and threshold • Then, Loss bound does not depend on dimension
Incorporating Kernels • Matrix A can be written as , where • Therefore:
Online Experiments • Task: Document filtering according to topics • Dataset: Reuters-21578 • 10,000 documents • Documents labeled as Relevant and Irrelevant • A few relevant documents (1% - 10% of entire set) • Algorithms: • POLA • 1 Nearest Neighbor (1-NN) • Perceptron Algorithm • Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)
POLA for Document Filtering • Get a document • Calculate distance to relevant documents observed so far using current matrix • Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold • Get true label • Update matrix and threshold
POLA error POLA error POLA error PAUM error Perceptron error 1-NN error Document Filtering Results • Each blue point corresponds to one topic • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins
Batch Experiments • Task: Handwritten digits recognition • Dataset: MNIST dataset • 45 binary classification problems (all pairs) • 10,000 training examples • 10,000 test examples • Algorithms: Used k-NN with various metrics: • Pseudo-metric learned by POLA • Euclidean distance • Metric induced by Fisher Discriminant Analysis (FDA) • Metric learned by Relevant Component Analysis (RCA) (Bar-Hillel, Hertz, Shental, and Weinshall)
MNIST Results • Each blue point corresponds to one binary classification problem • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins RCA error FDA error Euclidean distance error RCA was applied after using PCA as a pre-processing step
Toy problem A color-coded matrix of Euclidean distances between pairs of images
Mapping found by POLA • Our Pseudo-metrics:
Summary and Extensions • An online algorithm for learning pseudo-metrics • Formal properties, good experimental results Extensions: • Alternative regularization schemes to the Frobenius norm • “Learning to learn”: • Learning a metric from one set of classes and apply to another set of related classes