1 / 17

What is The Optimal Number of Features? A learning theoretic Perspective

What is The Optimal Number of Features? A learning theoretic Perspective. Amir Navot Joint work with Ran Gilad-Bachrach, Yiftah Navot and Naftali Tishby. What is Feature Selection?. Feature selection: select a “good” small subset of features (out of the given set)

jguzman
Download Presentation

What is The Optimal Number of Features? A learning theoretic Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is The Optimal Number of Features?A learning theoretic Perspective Amir Navot Joint work with Ran Gilad-Bachrach, Yiftah Navot and Naftali Tishby

  2. What is Feature Selection? • Feature selection: select a “good” small subset of features (out of the given set) • “Good” subset: enables to build good classifiers • Feature selection is a special form of dimensionality reduction

  3. Reasons to do Feature Selection • Reduces computational complexity • Saves the cost of measuring extra features • The selected features can provide insights about the nature of the problem • Improves accuracy • Improves accuracy

  4. The Questions • Under which conditions can feature selection improve classification accuracy? • What is the optimal number of features? • How does this number depend on the training set size? We discuss these questions by analyzing one simple setting

  5. Two Gaussians - Problem Setting • Binary classification task • The coordinates are the features • Optimal classifier: • given features subset F : • If is known: using all the features is optimal

  6. Problem Setting – Cont. • Assume that is estimated from a sample of size m, • Given an estimator and a features subset F , we consider the classifier • We want to find a subset of features that minimizes the average generalization error: • We assume ) we only need to find the optimal number of features • We consider the optimal estimator:

  7. Illustration  -

  8. Result • The number of features that minimizes the average error for a training set of size m is: • Observations: • If , there is a non trivial optimal n • When then is optimal choice • Decision on adding depends on other features

  9. Solving for Specific 

  10. Solving for Specific  -Cont. x-axis: number of features

  11. Proof • For a given estimator for , the generalization error of the classifier is: ( is the CDF function of a standard Gaussian )

  12. Proof – Cont. • We want to find the number of features nthat minimizes the average error: • Lemma: is a good approximation for when m is large enough.

  13. Proof – Cont. • Therefore, minimizing is equivalent to maximizing • For the optimal estimator , and Therefore,

  14. “Empirical Proof” of the Lemma x-axis: number of features

  15. Linear SVM Error(averaged on 200 repeats, c=0.01, using Gavin Cawley’s tool box) x-axis: number of features

  16. Conclusions • Even when all the features carry information and are independent, • Using all the features may be suboptimal • The decision to add a feature depends on the others • The optimal number of features depends critically on the sample size ( the quality of model estimation)

  17. What is The Optimal Number of Features?A learning theoretic Perspective Amir Navot Joint work with Ran Gilad-Bachrach, Yiftah Navot and Naftali Tishby

More Related