1 / 67

Transfer Learning with Applications to Text Classification

Jing Peng Computer Science Department. Transfer Learning with Applications to Text Classification. Machine learning: study of algorithms that improve performance P on some task T using experience E Well defined learning task: <P,T,E>. Learning to recognize targets in images:.

zaria
Download Presentation

Transfer Learning with Applications to Text Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jing Peng Computer Science Department Transfer Learning with Applications to Text Classification

  2. Machine learning: • study of algorithms that • improve performance P • on some task T • using experience E • Well defined learning task: <P,T,E>

  3. Learning to recognize targets in images:

  4. Learning to classify text documents:

  5. Learning to build forecasting models:

  6. Growth of Machine Learning • Machine learning is preferred approach to • Speech processing • Computer vision • Medical diagnosis • Robot control • News articles processing • … • This machine learning niche is growing • Improved machine learning algorithms • Lots of data available • Software too complex to code by hand • …

  7. :estimation error Learning • Given • Least squares methods • Learning focuses on minimizing :approximation error H

  8. Transfer Learning with Applications to Text Classification Main Challenge: Transfer learning High Dimensional (4000 features) Overlapping (<80% features are the same) Solution with performance bounds

  9. Standard Supervised Learning training (labeled)‏ test (unlabeled)‏ Classifier 85.5% New York Times New York Times

  10. In Reality…… training (labeled)‏ test (unlabeled)‏ Classifier 64.1% Labeled data not available! Reuters New York Times New York Times

  11. Domain Difference  Performance Drop train test ideal setting Classifier NYT NYT 85.5% New York Times New York Times realistic setting Classifier NYT Reuters 64.1% Reuters New York Times

  12. High Dimensional Data Transfer High Dimensional Data: Text Categorization Image Classification The number of features in our experiments is more than 4000 • Challenges: • High dimensionality. • more than training examples • Euclidean distance becomes meaningless

  13. Why Dimension Reduction? DMAX DMIN

  14. Curse of Dimensionality Dimensions

  15. Curse of Dimensionality Dimensions

  16. High Dimensional Data Transfer High Dimensional Data: Text Categorization Image Classification The number of features in our experiments is more than 4000 • Challenges: • High dimensionality. • more than training examples • Euclidean distance becomes meaningless • Feature sets completely overlapping? • No. Some less than 80% features are the same. • Marginally not so related? • Harder to find transferable structures • Proper similarity definition.

  17. PAC (Probably Approximately Correct) learning requirement • Training and test distributions must be the same

  18. Transfer between high dimensional overlapping distributions • Overlapping Distributions Data from two domains may not come from the same part of space; potentially overlap at best.

  19. Transfer between high dimensional overlapping distributions • Overlapping Distribution Data from two domains may not come from the same part of space; potentially overlap at best.

  20. Transfer between high dimensional overlapping distributions • Overlapping Distribution Data from two domains may not come from the same part of space; potentially overlap at best.

  21. Transfer between high dimensional overlapping distributions • Overlapping Distribution Data from two domains may not be lying on exactly the same space, but at most an overlapping one.

  22. Transfer between high dimensional overlapping distributions • Overlapping Distribution Data from two domains may not be lying on exactly the same space, but at most an overlapping one.

  23. Transfer between high dimensional overlapping distributions • Problems with overlapping distributions • Overlapping features alone may not provide sufficient predictive power

  24. Transfer between high dimensional overlapping distributions • Problems with overlapping distributions • Overlapping features alone may not provide sufficient predictive power

  25. Transfer between high dimensional overlapping distributions • Problems with overlapping distributions • Overlapping features alone may not provide sufficient predictive power

  26. Transfer between high dimensional overlapping distributions • Problems with overlapping distributions • Overlapping features alone may not provide sufficient predictive power Hard to predict correctly

  27. Transfer between high dimensional overlapping distributions • Overlapping Distributions • Use the union of all features and fill in missing values with “zeros”?

  28. Transfer between high dimensional overlapping distributions • Overlapping Distributions • Use the union of all features and fill in missing values with “zeros”?

  29. Transfer between high dimensional overlapping distributions • Overlapping Distribution • Use the union of all features and fill in the missing values with “zeros”? Does it helps?

  30. Transfer between high dimensional overlapping distributions

  31. Transfer between high dimensional overlapping distributions D2 { A, B} = 0.0181 > D2 {A, C} = 0.0101

  32. Transfer between high dimensional overlapping distributions D2 { A, B} = 0.0181 > D2 {A, C} = 0.0101 A is mis-classified as in the class of C, instead of B

  33. Transfer between high dimensional overlapping distributions • When one uses the union of overlapping and non-overlapping features and replaces missing values with “zero”, • distance of two marginal distributions p(x) can become asymptotically very large as a function of non-overlapping features: • becomes a dominant factor in similarity measure.

  34. Transfer between high dimensional overlapping distributions • High dimensionality can underpin important features

  35. Transfer between high dimensional overlapping distributions

  36. Transfer between high dimensional overlapping distributions The “blues” are closer to the “greens” than to the “reds”

  37. LatentMap: two step correction • Missing value regression • Bring marginal distributions closer • Latent space dimensionality reduction • Further bring marginal distributions closer • Ignore non-important noisy and “error imported features” • Identify transferable substructures across two domains.

  38. Missing Value Regression • Predict missing values (recall the previous example)

  39. Missing Value Regression • Predict missing values (recall the previous example)

  40. Missing Value Regression • Predict missing values (recall the previous example) 1. Project to overlapped feature

  41. Missing Value Regression • Predict missing values (recall the previous example) 2. Map from z to x Relationship found byregression 1. Project to overlapped feature

  42. Missing Value Regression • Predict missing values (recall the previous example) 2. Map from z to x Relationship found byregression 1. Project to overlapped feature

  43. Missing Value Regression D { img(A’), B} = 0.0109 < D {img(A’), C} = 0.0125 • Predict missing values (recall the previous example) 2. Map from z to x Relationship found byregression 1. Project to overlapped feature

  44. Missing Value Regression D { img(A’), B} = 0.0109 < D {img(A’), C} = 0.0125 • Predcit missing values (recall the previous example) 2. Map from z to x Relationship found byregression 1. Project to overlapped feature A is correctly classified as in the same class as B

  45. Dimensionality Reduction

  46. Dimensionality Reduction Missing Values

  47. Dimensionality Reduction Missing Values Overlapping Features

  48. Dimensionality Reduction Missing Values Missing Values Filled Overlapping Features

  49. Dimensionality Reduction Word vector Matrix Missing Values Missing Values Filled Overlapping Features

  50. Dimensionality Reduction • Project the word vector matrix to the most important and inherent sub-space

More Related