1.3k likes | 1.6k Views
Feature Selection & Maximum Entropy. Advanced Statistical Methods in NLP Ling 572 January 26, 2012. Roadmap. Feature selection and weighting Feature weighting Chi-square feature selection Chi-square feature selection example HW #4 Maximum Entropy Introduction: Maximum Entropy Principle
E N D
Feature Selection & Maximum Entropy Advanced Statistical Methods in NLP Ling 572 January 26, 2012
Roadmap • Feature selection and weighting • Feature weighting • Chi-square feature selection • Chi-square feature selection example • HW #4 • Maximum Entropy • Introduction: Maximum Entropy Principle • Maximum Entropy NLP examples
Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting
Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r|
Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r| • Approaches: • Global & local approaches • Feature extraction: • New features in r’ transformations of features in r
Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r| • Approaches: • Global & local approaches • Feature extraction: • New features in r’ transformations of features in r • Feature selection: • Wrapper techniques
Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r| • Approaches: • Global & local approaches • Feature extraction: • New features in r’ transformations of features in r • Feature selection: • Wrapper techniques • Feature scoring
Feature Weighting • For text classification, typical weights include:
Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1}
Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1} • Term frequency (tf): • # occurrences of tk in document di
Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1} • Term frequency (tf): • # occurrences of tk in document di • Inverse document frequency (idf): • dfk: # of docs in which tk appears; N: # docs • idf = log (N/(1+dfk))
Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1} • Term frequency (tf): • # occurrences of tk in document di • Inverse document frequency (idf): • dfk: # of docs in which tk appears; N: # docs • idf = log (N/(1+dfk)) • tfidf = tf*idf
Chi Square • Tests for presence/absence of relation between random variables
Chi Square • Tests for presence/absence of relation between random variables • Bivariate analysis tests 2 random variables • Can test strength of relationship • (Strictly speaking) doesn’t test direction
Chi Square • Tests for presence/absence of relation between random variables • Bivariate analysis tests 2 random variables • Can test strength of relationship
Chi Square • Tests for presence/absence of relation between random variables • Bivariate analysis tests 2 random variables • Can test strength of relationship • (Strictly speaking) doesn’t test direction
Chi Square Example • Can gender predict shoe choice? Due to F. Xia
Chi Square Example • Can gender predict shoe choice? • A: male/female Features Due to F. Xia
Chi Square Example • Can gender predict shoe choice? • A: male/female Features • B: shoe choice Classes: {sandal, sneaker,…} Due to F. Xia
Chi Square Example • Can gender predict shoe choice? • A: male/female Features • B: shoe choice Classes: {sandal, sneaker,…} Due to F. Xia
Comparing Distributions • Observed distribution (O): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia
Computing Chi Square • Expected value for cell= • row_total*column_total/table_total
Computing Chi Square • Expected value for cell= • row_total*column_total/table_total
Computing Chi Square • Expected value for cell= • row_total*column_total/table_total • X2=(6-9.5)2/9.5+
Computing Chi Square • Expected value for cell= • row_total*column_total/table_total • X2=(6-9.5)2/9.5+(17-11)2/11
Computing Chi Square • Expected value for cell= • row_total*column_total/table_total • X2=(6-9.5)2/9.5+(17-11)2/11+.. • = 14.026
Calculating X2 • Tabulate contigency table of observed values: O
Calculating X2 • Tabulate contigency table of observed values: O • Compute row, column totals
Calculating X2 • Tabulate contigency table of observed values: O • Compute row, column totals • Compute table of expected values, given row/col • Assuming no association
Calculating X2 • Tabulate contigency table of observed values: O • Compute row, column totals • Compute table of expected values, given row/col • Assuming no association • Compute X2
For 2x2 Table • O: • E:
For 2x2 Table • O: • E:
For 2x2 Table • O: • E:
For 2x2 Table • O: • E:
For 2x2 Table • O: • E:
For 2x2 Table • O: • E:
For 2x2 Table • O: • E:
X2 Test • Test whether random variables are independent
X2 Test • Test whether random variables are independent • Null hypothesis: R.V.s are independent
X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic:
X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic: • Compute degrees of freedom
X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic: • Compute degrees of freedom • df = (# rows -1)(# cols -1)
X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic: • Compute degrees of freedom • df = (# rows -1)(# cols -1) • Shoe example, df = (2-1)(5-1)=4