530 likes | 595 Views
Learning with information of features 2009-06-05. Contents. Motivation. Incorporating prior knowledge on features into learning ( AISTATS’07 ). Regularized learning with networks of features ( NIPS’08 ). Conclusion. Contents. Motivation.
E N D
Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name
Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name
Motivation Given data X∈Rn×d + prior information of samples Manifold structure information LAPSVM Transformation invarianceVSVM, ISSVM Permutation invarianceπ- SVM Imbalance informationSVM for imbalance distribution Cluster structure informationStructure SVM Company name
Motivation Information in the sample space (space spanned by samples) Company name
Motivation Prior information in the feature or attribute space (space spanned by features) Company name
Motivation + prior information of features for better generalization Company name
Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name
Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name
Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name
Incorporating prior knowledge on features into learning (AISTATS’07) Image recognition task Feature : pixel (gray level) Coordinate (x,y) of pixel can be treated as Feature of features: meta-feature Feature with similar meta-feature, or more specifically, adjacent pixel should be assigned similar weights. Propose a framework incorporating meta-features into learning Company name
Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name
Kernel design by meta-features In the standard approach of linear SVM, we solve which can be viewed as finding the maximum-likelihood hypothesis, under the above constraint, where we have a Gaussian prior on w The covariance matrix C equals the unit matrix, i.e. all weights are assumed to be independent and have the same variance. Company name
Kernel design by meta-features We can use meta-feature to create a better prior onw : features with similar meta-feature are expected to be similar in weights, i.e., the weights would be a smooth function of the meta-features. Use a Gaussian prior on w, defined by a covariance matrixC, and the covariance between a pair of weights is taken to be a decreasing function of the distance between their meta-features. Company name
Kernel design by meta-features • The invariance is incorporated by the assumption of smoothness of weights in the meta-feature space. • Gaussian process: xy, smoothness of y in the feature space. This work: uw, smoothness of weight w in the meta-feature space. Company name
Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name
A toy problem MINIST dataset (2 vs. 5) Company name
Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name
Handwritten digit recognition aided by meta-features Company name
Handwritten digit recognition aided by meta-features Define features and meta-features same height for all isosceles triangle Company name
Handwritten digit recognition aided by meta-features 3 inputs: 40×20×20=16000 (40 stands for ur and uφand 20×20 for the center position.) 2 inputs: 8000 (same feature for a rotation of 180○) Total 24000 features. Company name
Handwritten digit recognition aided by meta-features Define covariance matrix • The weights across features with different sizes, orientations or number of inputs are uncorrelated. • 40+20 identical blocks of size 400×400. Company name
Handwritten digit recognition aided by meta-features Company name
Incorporating prior knowledge on features into learning (AISTATS’07) • Motivation • Kernel design by meta-features • A toy example • Handwritten digit recognition aided by meta-features • Towards a theory of meta-features Company name
Towards a theory of meta-features Company name
Towards a theory of meta-features Company name
Towards a theory of meta-features Company name
Towards a theory of meta-features Company name
Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name
Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name
Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name
Motivation • Supervised learning problems, we may know which features yield similar information about the target variable. • Predicting the topic of a document, we know two words are synonyms. • Image recognition, we know which pixels are adjacent. • Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Company name
Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name
Regularized learning with networks of features A directed network or graph of features, G: Vertices: the features of the model Edges: link features whose weights are believed to be similar. Pij : the weight of the directed edge from vertex i to vertex j. Company name
Regularized learning with networks of features • Minimize above loss function is equivalent to finding the MAP estimate for w, and w is a priori normally distributed with mean zero and covariance matrix 2M-1. • If P is sparse (only kd entries for k<<d), then the additional matrix multiply is O(d), but the constructed covariance structure over w can be dense. • The feature network regularization penalty is identical to LLE except that the embedding is found for feature weights rather than data instances. Company name
Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name
Extensions to feature network regularization • Regularizing with classes of features (In machine learning, features can often be grouped into classes, such that all weights of the features in a given class are drawn from the same underlying distribution.) k disjoint classes of features whose weights are drawn i.i.d. N(μi, σ2) with μiunknown but σ2 known and shared across all classes. The number of edges in this construction scales quadratically in the clique sizes, resulting in feature graphs that are not sparse. Company name
1 1 uk 1 1 Extensions to feature network regularization Solution : can be optimized Company name
Extensions to feature network regularization • Incorporating Feature Dissimilarities • Regularizing Features with the Graph Laplacian Network penalty: penalizes each feature equally Graph penalty: penalizes each edge equally The Laplacian penalty will focus most of the regularization cost on features with many neighbors. Company name
Regularized learning with networks of features (NIPS’08) • Motivation • Regularized learning with networks of features • Extensions to feature network regularization • Experiment Company name
Experiments • Experiments on 20 Newsgroups Features: 11,376 words occurred in at least 20 documents. Feature similarity : a binary vector denoting its presence/absence in 20,000 documents, cosines between binary vectors. (25 neighbors) Company name
Experiments on 20 Newsgroups Company name
Experiments • Experiments on sentiment classification (Product review datasets, sentimentally-charged words from the SentiWord-Net datasets) 1.200 words from SentiWordNet which also occurred in the product reviews at least 100 times. Words with high positive and negative sentiment scores to form ‘positive word cluster’ and ‘negative word cluster’, also two virtual features and a dissimilarity edge between them. Company name
Sentiment Classification Company name
Sentiment Classification 2. Computed the correlations of all features with the SentiWord-Net features so that each word was represented as a 200 dimensional vector of correlations with these highly charged sentiment words. Feature similarity can be computed from those vectors. (100 nearest neighbors) Company name
Sentiment Classification Company name
Contents Motivation Incorporating prior knowledge on features into learning (AISTATS’07) Regularized learning with networks of features (NIPS’08) Conclusion Company name
Conclusion • Smoothness assumption of feature weights. • Restrict to certain application, the define of meta-feature or feature similarity graph. • Feature information derived directly from the given data? The discrimination of individual features. Company name
Conclusion • Fisher’s discriminant ratio (F1) Emphasize on the geometrical characteristics of the class distributions, or more specifically, the manner in which classes are separated which is the most critical for classification accuracy. • Ratio of the separated region (F2) • Feature efficiency (F3) Company name