1 / 30

A Survey on Distance Metric Learning (Part 2)

A Survey on Distance Metric Learning (Part 2). Gerry Tesauro IBM T.J.Watson Research Center. Acknowledgement. Lecture material shamelessly adapted from the following sources: Kilian Weinberger: “Survey on Distance Metric Learning” slides IBM summer intern talk slides (Aug. 2006)

goldy
Download Presentation

A Survey on Distance Metric Learning (Part 2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center

  2. Acknowledgement • Lecture material shamelessly adapted from the following sources: • Kilian Weinberger: • “Survey on Distance Metric Learning” slides • IBM summer intern talk slides (Aug. 2006) • Sam Roweis slides (NIPS 2006 workshop on “Learning to Compare Examples”) • Yann LeCun talk slides (CVPR 2005, 2006)

  3. Outline – Part 2 • Neighbourhood Components Analysis (Golderberger et al.), Metric Learning by Collapsing Classes (Globerson & Roweis) • Metric Learning for Kernel Regression (Weinberger & Tesauro) • Metric learning for RL basis function construction (Keller et al.) • Similarity learning for image processing (LeCun et al.)

  4. Neighborhood Component Analysis Distance metric for visualization and kNN (Goldberger et. al. 2004)

  5. Weinberger & Tesauro, AISTATS 2007 Metric Learning for Kernel Regression

  6. Killing three birds with one stone: We construct a method for linear dimensionality reduction that generates a meaningful distance metric optimally tuned for distance-based kernel regression

  7. Kernel Regression • Given training set {(xj , yj), j=1,…,N} where x is -dim vector and y is real-valued, estimate value of a test point xi by weighted avg. of samples: where kij = kD (xi, xj) is a distance-based kernel function using distance metric D

  8. Choice of Kernel • Many functional forms for kijcan be used in MLKR; our empirical work uses the Gaussian kernel where σ is a kernel width parameter (can set σ=1 W.L.O.G. since we learn D) softmax regression estimate similar to Roweis’ softmax classifier

  9. Distance Metric for Nearest Neighbor Regression Learn a linear transformation that allows to estimate the value of a test point from its nearest neighbors

  10. Mahalanobis Metric Distance function is a pseudo Mahalanobis metric (Generalizes Euclidean distance)

  11. General Metric Learning Objective • Find parmaterized distance function Dθ that minimizes total leave-one-out cross-validation loss function • e.g. params θ = elements Aij of A matrix • Since we’re solving for A not M, optimization is non-convex  use gradient descent

  12. Gradient Computation where xij = xi – xj • For fast implementation: • Don’t sum over all i-j pairs, only go up to ~1000 nearest neighbors for each sample i • Maintain nearest neighbors in a heap-tree structure, update heap tree every 15 gradient steps • Ignore sufficiently small values of kij ( < e-34 ) • Even better data structures: cover trees, k-d trees

  13. Learned Distance Metric example orig. Euclidean D < 1 learned D < 1

  14. “Twin Peaks” test Training: n=8000 we added 3 dimensions with 1000% noise we rotated 5 dimensions randomly

  15. Input Variance Noise Signal

  16. Test data

  17. Test data

  18. Output Variance Signal Noise

  19. DimReduction with MLKR • FG-NET face data: 82 persons, 984 face images w/age

  20. DimReduction with MLKR • FG-NET face data: 82 persons, 984 face images w/age

  21. DimReduction with MLKR PowerManagement data (d=21) • Force A to be rectangular • Project onto eigenvectors of A • Allows visualization of data

  22. Robot arm results (8,32dim) regression error

  23. Resource Arbiter App Manager App Manager Server Server Server Server Server Server Server Server App Manager Unity Data Center Prototype • Objective: Learn long-range resource value estimates for each application manager • State Variables (~48): • Arrival rate • ResponseTime • QueueLength • iatVariance • rtVariance • Action: # of servers allocated • by Arbiter • Reward: SLA(Resp. Time) Maximize Total SLA Revenue 5 sec Demand (HTTP req/sec) Demand (HTTP req/sec) Value(#srvrs) Value(#srvrs) Value(#srvrs) SLA SLA SLA Value(RT) WebSphere 5.1 Value(#srvrs) WebSphere 5.1 Value(RT) DB2 DB2 Trade3 Batch Trade3 8 xSeries servers (Tesauro, AAAI 2005; Tesauro et al., ICAC 2006)

  24. Power & Performance Management • Objective: Managing systems to multi-discipline objectives: minimize Resp. Time and minimize Power Usage • State Variables (21): • Power Cap • Power Usage • CPU Utilization • Temperature • # of requests arrived • Workload intensity (# Clients) • Response Time • Action: Power Cap • Reward: SLA(Resp. Time) – Power Usage (Kephart et al., ICAC 2007)

  25. IBM Regression Results TEST ERROR MLKR 14/47 3/5 10/22

  26. Metric Learning for RL basis function construction (Keller et al. ICML 2006) • RL Dataset of state-action-reward tuples {(si, ai, ri), i=1,…,N}

  27. Value Iteration • Define an iterative “bootstrap” calculation: • Each round of VI must iterate over all states in the state space • Try to speed this up using state aggregation (Bertsekas & Castanon, 1989) • Idea: Use NCA to aggregate states: • project states into lower-dim rep; keep states with similar Bellman error close together • use projected states to define a set of basis functions {} • learn linear value function over basis functions: V =  θii

  28. Chopra et. al. 2005 Similarity metric for image verification. Problem: Given a pair of face-images, decide if they are from the same person.

  29. Chopra et. al. 2005 Similarity metric for image verification. Problem: Given a pair of face-images, decide if they are from the same person. Too difficult for linear mapping!

More Related