1 / 31

Self-taught Learning Transfer Learning from Unlabeled Data

Self-taught Learning Transfer Learning from Unlabeled Data. Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University. The “one learning algorithm” hypothesis.

amma
Download Presentation

Self-taught Learning Transfer Learning from Unlabeled Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-taught LearningTransfer Learning from Unlabeled Data Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University

  2. The “one learning algorithm” hypothesis • There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. • Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] (Roe et al., 1992. Hawkins & Blakeslee, 2004) Self-taught Learning

  3. The “one learning algorithm” hypothesis • There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. • Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] If we could find this one learning algorithm, we would be done. (Finally!) (Roe et al., 1992. Hawkins & Blakeslee, 2004) Self-taught Learning

  4. Finding a deep learning algorithm • If the brain really is one learning algorithm, it would suffice to just: • Find a learning algorithm for a single layer, and, • Show that it can build a small number of layers. • We evaluate our algorithms: • Against biology. • On applications. • e.g., Sparse RBMs for V2: • Poster yesterday (Lee et al.) • This talk Self-taught Learning

  5. Cars Motorcycles Supervised learning Train Test Supervised learning algorithms may not work well with limited labeled data. Self-taught Learning

  6. Learning in humans • Your brain has 1014 synapses (connections). • You will live for 109 seconds. • If each synapse requires 1 bit to parameterize, you need to “learn” 1014 bits in 109 seconds. • Or, 105 bits per second. Human learning is largely unsupervised, and uses readily available unlabeled data. (Geoffrey Hinton, personal communication) Self-taught Learning

  7. Cars Motorcycles Supervised learning Train Test Self-taught Learning

  8. “Brain-like” Learning Train Test Cars Motorcycles Unlabeled images (randomly downloaded from the Internet) Self-taught Learning

  9. “Brain-like” Learning + ? Labeled Webpages Labeled Digits Unlabeled English characters + ? Unlabeled newspaper articles + ? Unlabeled English speech Labeled Russian Speech Self-taught Learning

  10. “Self-taught Learning” + ? Labeled Webpages Labeled Digits Unlabeled English characters + ? Unlabeled newspaper articles + ? Unlabeled English speech Labeled Russian Speech Self-taught Learning

  11. Cars Cars Cars Motorcycles Motorcycles Motorcycles Bus Tractor Aircraft Helicopter Motorcycle Car Natural scenes Recent history of machine learning • 20 years ago: Supervised learning • 10 years ago: Semi-supervised learning. • 10 years ago: Transfer learning. • Next: Self-taught learning?

  12. Self-taught Learning • Labeled examples: • Unlabeled examples: • The unlabeled and labeled data: • Need not share labels y. • Need not share a generative distribution. • Advantage: Such unlabeled data is often easy to obtain. Self-taught Learning

  13. A self-taught learning algorithm Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors” . x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411 = 0.8 * + 0.3 * + 0.5 * Self-taught Learning

  14. A self-taught learning algorithm Key steps: Learn good bases using unlabeled data . Use these learnt bases to construct “higher-level” features for the labeled data. Apply a standard supervised learning algorithm on these features. x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411 = 0.8 * + 0.3 * + 0.5 * Self-taught Learning

  15. Learning the bases: Sparse coding Given only unlabeled data, we find good bases b using sparse coding: Reconstruction error Sparsity penalty (Efficient algorithms: Lee et al., NIPS 2006) [Details: An extra normalization constraint on is required.] Self-taught Learning

  16. Example bases Learnt bases: “Edges” Natural images. Learnt bases: “Strokes” Handwritten characters. Self-taught Learning

  17. Constructing features • Using the learnt bases b, compute features for the examples xlfrom the classification task by solving: • Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features. Sparsity penalty Reconstruction error = 0.8 * + 0.3 * + 0.5 * xl = 0.8 * b87+ 0.3 * b376+ 0.5 * b411 Self-taught Learning

  18. Image classification Large image (Platypus from Caltech101 dataset) Feature visualization Self-taught Learning

  19. Image classification Platypus image (Caltech101 dataset) Feature visualization Self-taught Learning

  20. Image classification Platypus image (Caltech101 dataset) Feature visualization Self-taught Learning

  21. Image classification Platypus image (Caltech101 dataset) Feature visualization Self-taught Learning

  22. Image classification Other reported results: Fei-Fei et al, 2004: 16% Berg et al., 2005: 17% Holub et al., 2005: 40% Serre et al., 2005: 35% Berg et al, 2005: 48% Zhang et al., 2006: 59% Lazebnik et al., 2006: 56% (15 labeled images per class) 36.0% error reduction Self-taught Learning

  23. Character recognition Digits Handwritten English English font Handwritten English classification (20 labeled images per handwritten character) Bases learnt on digits English font classification (20 labeled images per font character) Bases learnt on handwritten English 8.2% error reduction 2.8% error reduction Self-taught Learning

  24. Text classification Reuters newswire UseNet articles Webpages Webpage classification (2 labeled documents per class) Bases learnt on Reuters newswire UseNet classification (2 labeled documents per class) Bases learnt on Reuters newswire 4.0% error reduction 6.5% error reduction Self-taught Learning

  25. Shift-invariant sparse coding Sparse features Basis functions Reconstruction (Algorithms: Grosse et al., UAI 2007) Self-taught Learning

  26. Audio classification Speaker identification (5 labels, TIMIT corpus, 1 sentence per speaker.) Bases learnt on different dialects Musical genre classification (5 labels, 18 seconds per genre.) Bases learnt on different genres, songs 8.7% error reduction 5.7% error reduction (Details: Grosse et al., UAI 2007) Self-taught Learning

  27. Sparse deep belief networks . . . h: Hidden layer Sparse RBM W, b, c: Parameters . . . v: Visible layer New (Details: Lee et al., NIPS 2007. Poster yesterday.) Self-taught Learning

  28. Sparse deep belief networks Image classification (Caltech101 dataset) 3.2% error reduction (Details: Lee et al., NIPS 2007. Poster yesterday.) Self-taught Learning

  29. Cars Motorcycles Summary • Self-taught learning: Unlabeled data does not share the labels of the classification task. • Use unlabeled data to discover features. • Use sparse coding to construct an easy-to-classify, “higher-level” representation. Unlabeled images = 0.8 * + 0.3 * + 0.5 * Self-taught Learning

  30. THE END

  31. Related Work • Weston et al, ICML 2006 • Make stronger assumptions on the unlabeled data. • Ando & Zhang, JMLR 2005 • For natural language tasks and character recognition, use heuristics to construct a transfer learning task using unlabeled data. Self-taught Learning

More Related