150 likes | 253 Views
Multi-Label Prediction via Compressed Sensing. By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009). Presented by: Lingbo Li ECE, Duke University 01-22-2010. * Some notes are directly copied from the original paper. Outline. Introduction Preliminaries Learning Reduction
E N D
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University 01-22-2010 * Some notes are directly copied from the original paper.
Outline • Introduction • Preliminaries • Learning Reduction • Compression and Reconstruction • Empirical Results • Conclusion
Introduction • Large database of images; • Goal: predict who or what is in a given image • Samples: images with corresponding labels is the total number of entities in the whole database. • One-against-all algorithm: Learn a binary predictor for each label (class). Computation is expensive when is large. e.g. , • Assume the output vector is sparse.
Introduction 1 1 1 1 1 1 Compressed sensing: For any sparse vector , it is highly possible to compress to logarithmic in dimension with perfect reconstruction of . Main idea: “Learn to predict compressed label vectors, and then use sparse reconstruction algorithm to recover uncompressed labels from these predictions”
Preliminaries • : input space; • : output (label) space, where • Training data: • Goal: to learn the predictor with low mean-squared error Assume • is very large; • Expected value is sparse, with only a few non-zero entries.
Learning reduction • Linear compression function where • Goal: to learn a predictor Predict the label y with the Predictor F Predict the compressed label Ay with the Predictor H Samples Compressed Samples To minimize To minimize
Reduction-training and prediction Reconstruction Algorithm R: If is close to , then should be close to
Compression Functions Examples of valid compression functions:
Reconstruction Algorithms Examples of valid reconstruction algorithms: iterative and greedy algorithms • Orthogonal Matching Pursuit (OMP) • Forward-Backward Greedy (FoBa) • Compressive Sampling Matching Pursuit (CoSaMP)
General Robustness Guarantees Sparsity error is defined as where is the best k-sparse approximation of What if the reduction create a problem harder to solve than the original problem?
Linear Prediction • If there is a perfect linear predictor of , then there will be a perfect linear predictor of :
Experimental Results • Experiment 1: Image data (collected by the ESP Game) 65k images, 22k unique labels; Keep the 1k most frequent labels; the least frequent occurs 39 times while the most frequent occurs about 12k times, 4 labels on average per image; Half of the data as training and half as testing. • Experiment 2: Text data (collected from http://delicious.com/) 16k labeled web page, 983 unique labels; the least frequent occurs 21 times, the most frequent occurs about 6500 times, 19 labels on average per web page; Half of the data as training and half as testing. • Compression function A: select m random rows of the Hadamard matrix. • Test the greedy and iterative reconstruction algorithm: OMP, FoBa, CoSaMp and Lasso. • Use correlation decoding (CD) as a baseline method for comparisons.
Experimental Results Measure Measure the precision Top two: image data; Bottom: text data
Conclusion • Application of compressed sensing to multi-label prediction problem with output sparsity; • Efficient reduction algorithm with the number of predictions equal to logarithmic in original labels; • Robustness Guarantees from compressed case to the original case; and vice versa for the linear prediction setting.