1 / 53

Learning with information of features 2009-06-05

Learning with information of features 2009-06-05. Contents. Motivation. Incorporating prior knowledge on features into learning ( AISTATS’07 ). Regularized learning with networks of features ( NIPS’08 ). Conclusion. Contents. Motivation.

Download Presentation

Learning with information of features 2009-06-05

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning with information of features2009-06-05

  2. Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name

  3. Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name

  4. Motivation Given data X∈Rn×d + prior information of samples Manifold structure information LAPSVM Transformation invarianceVSVM, ISSVM Permutation invarianceπ- SVM Imbalance informationSVM for imbalance distribution Cluster structure informationStructure SVM Company name

  5. Motivation Information in the sample space (space spanned by samples) Company name

  6. Motivation Prior information in the feature or attribute space (space spanned by features) Company name

  7. Motivation + prior information of features for better generalization Company name

  8. Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name

  9. Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name

  10. Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name

  11. Incorporating prior knowledge on features into learning (AISTATS’07) Image recognition task Feature : pixel (gray level) Coordinate (x,y) of pixel can be treated as Feature of features: meta-feature Feature with similar meta-feature, or more specifically, adjacent pixel should be assigned similar weights. Propose a framework incorporating meta-features into learning Company name

  12. Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name

  13. Kernel design by meta-features In the standard approach of linear SVM, we solve which can be viewed as finding the maximum-likelihood hypothesis, under the above constraint, where we have a Gaussian prior on w The covariance matrix C equals the unit matrix, i.e. all weights are assumed to be independent and have the same variance. Company name

  14. Kernel design by meta-features We can use meta-feature to create a better prior onw : features with similar meta-feature are expected to be similar in weights, i.e., the weights would be a smooth function of the meta-features. Use a Gaussian prior on w, defined by a covariance matrixC, and the covariance between a pair of weights is taken to be a decreasing function of the distance between their meta-features. Company name

  15. Kernel design by meta-features • The invariance is incorporated by the assumption of smoothness of weights in the meta-feature space. • Gaussian process: xy, smoothness of y in the feature space. This work: uw, smoothness of weight w in the meta-feature space. Company name

  16. Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name

  17. A toy problem MINIST dataset (2 vs. 5) Company name

  18. Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name

  19. Handwritten digit recognition aided by meta-features Company name

  20. Handwritten digit recognition aided by meta-features Define features and meta-features same height for all isosceles triangle Company name

  21. Handwritten digit recognition aided by meta-features 3 inputs: 40×20×20=16000 (40 stands for ur and uφand 20×20 for the center position.) 2 inputs: 8000 (same feature for a rotation of 180○) Total 24000 features. Company name

  22. Handwritten digit recognition aided by meta-features Define covariance matrix • The weights across features with different sizes, orientations or number of inputs are uncorrelated. • 40+20 identical blocks of size 400×400. Company name

  23. Handwritten digit recognition aided by meta-features Company name

  24. Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name

  25. Towards a theory of meta-features Company name

  26. Towards a theory of meta-features Company name

  27. Towards a theory of meta-features Company name

  28. Towards a theory of meta-features Company name

  29. Company name

  30. Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name

  31. Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name

  32. Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name

  33. Motivation • Supervised learning problems, we may know which features yield similar information about the target variable. • Predicting the topic of a document, we know two words are synonyms. • Image recognition, we know which pixels are adjacent. • Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Company name

  34. Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name

  35. Regularized learning with networks of features A directed network or graph of features, G: Vertices: the features of the model Edges: link features whose weights are believed to be similar. Pij : the weight of the directed edge from vertex i to vertex j. Company name

  36. Regularized learning with networks of features • Minimize above loss function is equivalent to finding the MAP estimate for w, and w is a priori normally distributed with mean zero and covariance matrix 2M-1. • If P is sparse (only kd entries for k<<d), then the additional matrix multiply is O(d), but the constructed covariance structure over w can be dense. • The feature network regularization penalty is identical to LLE except that the embedding is found for feature weights rather than data instances. Company name

  37. Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name

  38. Extensions to feature network regularization • Regularizing with classes of features (In machine learning, features can often be grouped into classes, such that all weights of the features in a given class are drawn from the same underlying distribution.) k disjoint classes of features whose weights are drawn i.i.d. N(μi, σ2) with μiunknown but σ2 known and shared across all classes. The number of edges in this construction scales quadratically in the clique sizes, resulting in feature graphs that are not sparse. Company name

  39. 1 1 uk 1 1 Extensions to feature network regularization Solution : can be optimized Company name

  40. Extensions to feature network regularization • Incorporating Feature Dissimilarities • Regularizing Features with the Graph Laplacian Network penalty: penalizes each feature equally Graph penalty: penalizes each edge equally The Laplacian penalty will focus most of the regularization cost on features with many neighbors. Company name

  41. Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name

  42. Experiments • Experiments on 20 Newsgroups Features: 11,376 words occurred in at least 20 documents. Feature similarity : a binary vector denoting its presence/absence in 20,000 documents, cosines between binary vectors. (25 neighbors) Company name

  43. Experiments on 20 Newsgroups Company name

  44. Experiments • Experiments on sentiment classification (Product review datasets, sentimentally-charged words from the SentiWord-Net datasets) 1.200 words from SentiWordNet which also occurred in the product reviews at least 100 times. Words with high positive and negative sentiment scores to form ‘positive word cluster’ and ‘negative word cluster’, also two virtual features and a dissimilarity edge between them. Company name

  45. Sentiment Classification Company name

  46. Sentiment Classification 2. Computed the correlations of all features with the SentiWord-Net features so that each word was represented as a 200 dimensional vector of correlations with these highly charged sentiment words. Feature similarity can be computed from those vectors. (100 nearest neighbors) Company name

  47. Sentiment Classification Company name

  48. Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name

  49. Conclusion • Smoothness assumption of feature weights. • Restrict to certain application, the define of meta-feature or feature similarity graph. • Feature information derived directly from the given data? The discrimination of individual features. Company name

  50. Conclusion • Fisher’s discriminant ratio (F1) Emphasize on the geometrical characteristics of the class distributions, or more specifically, the manner in which classes are separated which is the most critical for classification accuracy. • Ratio of the separated region (F2) • Feature efficiency (F3) Company name

More Related