1 / 130

Feature Selection & Maximum Entropy

Feature Selection & Maximum Entropy. Advanced Statistical Methods in NLP Ling 572 January 26, 2012. Roadmap. Feature selection and weighting Feature weighting Chi-square feature selection Chi-square feature selection example HW #4 Maximum Entropy Introduction: Maximum Entropy Principle

mary
Download Presentation

Feature Selection & Maximum Entropy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Selection & Maximum Entropy Advanced Statistical Methods in NLP Ling 572 January 26, 2012

  2. Roadmap • Feature selection and weighting • Feature weighting • Chi-square feature selection • Chi-square feature selection example • HW #4 • Maximum Entropy • Introduction: Maximum Entropy Principle • Maximum Entropy NLP examples

  3. Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting

  4. Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r|

  5. Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r| • Approaches: • Global & local approaches • Feature extraction: • New features in r’ transformations of features in r

  6. Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r| • Approaches: • Global & local approaches • Feature extraction: • New features in r’ transformations of features in r • Feature selection: • Wrapper techniques

  7. Feature Selection Recap • Problem: Curse of dimensionality • Data sparseness, computational cost, overfitting • Solution: Dimensionality reduction • New feature set r’ s.t. |r’| < |r| • Approaches: • Global & local approaches • Feature extraction: • New features in r’ transformations of features in r • Feature selection: • Wrapper techniques • Feature scoring

  8. Feature Weighting • For text classification, typical weights include:

  9. Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1}

  10. Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1} • Term frequency (tf): • # occurrences of tk in document di

  11. Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1} • Term frequency (tf): • # occurrences of tk in document di • Inverse document frequency (idf): • dfk: # of docs in which tk appears; N: # docs • idf = log (N/(1+dfk))

  12. Feature Weighting • For text classification, typical weights include: • Binary: weights in {0,1} • Term frequency (tf): • # occurrences of tk in document di • Inverse document frequency (idf): • dfk: # of docs in which tk appears; N: # docs • idf = log (N/(1+dfk)) • tfidf = tf*idf

  13. Chi Square • Tests for presence/absence of relation between random variables

  14. Chi Square • Tests for presence/absence of relation between random variables • Bivariate analysis tests 2 random variables • Can test strength of relationship • (Strictly speaking) doesn’t test direction

  15. Chi Square • Tests for presence/absence of relation between random variables • Bivariate analysis tests 2 random variables • Can test strength of relationship

  16. Chi Square • Tests for presence/absence of relation between random variables • Bivariate analysis tests 2 random variables • Can test strength of relationship • (Strictly speaking) doesn’t test direction

  17. Chi Square Example • Can gender predict shoe choice? Due to F. Xia

  18. Chi Square Example • Can gender predict shoe choice? • A: male/female  Features Due to F. Xia

  19. Chi Square Example • Can gender predict shoe choice? • A: male/female  Features • B: shoe choice  Classes: {sandal, sneaker,…} Due to F. Xia

  20. Chi Square Example • Can gender predict shoe choice? • A: male/female  Features • B: shoe choice  Classes: {sandal, sneaker,…} Due to F. Xia

  21. Comparing Distributions • Observed distribution (O): Due to F. Xia

  22. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  23. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  24. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  25. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  26. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  27. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  28. Comparing Distributions • Observed distribution (O): • Expected distribution (E): Due to F. Xia

  29. Computing Chi Square • Expected value for cell= • row_total*column_total/table_total

  30. Computing Chi Square • Expected value for cell= • row_total*column_total/table_total

  31. Computing Chi Square • Expected value for cell= • row_total*column_total/table_total • X2=(6-9.5)2/9.5+

  32. Computing Chi Square • Expected value for cell= • row_total*column_total/table_total • X2=(6-9.5)2/9.5+(17-11)2/11

  33. Computing Chi Square • Expected value for cell= • row_total*column_total/table_total • X2=(6-9.5)2/9.5+(17-11)2/11+.. • = 14.026

  34. Calculating X2 • Tabulate contigency table of observed values: O

  35. Calculating X2 • Tabulate contigency table of observed values: O • Compute row, column totals

  36. Calculating X2 • Tabulate contigency table of observed values: O • Compute row, column totals • Compute table of expected values, given row/col • Assuming no association

  37. Calculating X2 • Tabulate contigency table of observed values: O • Compute row, column totals • Compute table of expected values, given row/col • Assuming no association • Compute X2

  38. For 2x2 Table • O: • E:

  39. For 2x2 Table • O: • E:

  40. For 2x2 Table • O: • E:

  41. For 2x2 Table • O: • E:

  42. For 2x2 Table • O: • E:

  43. For 2x2 Table • O: • E:

  44. For 2x2 Table • O: • E:

  45. X2 Test • Test whether random variables are independent

  46. X2 Test • Test whether random variables are independent • Null hypothesis: R.V.s are independent

  47. X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic:

  48. X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic: • Compute degrees of freedom

  49. X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic: • Compute degrees of freedom • df = (# rows -1)(# cols -1)

  50. X2 Test • Test whether random variables are independent • Null hypothesis: 2 R.V.s are independent • Compute X2 statistic: • Compute degrees of freedom • df = (# rows -1)(# cols -1) • Shoe example, df = (2-1)(5-1)=4

More Related