1 / 11

Logistic Regression

Logistic Regression. 10701 /15781 Recitation February 5, 2008. Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Discriminative Classifier. Learn P(Y|X) directly Logistic regression for binary classification:

zan
Download Presentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression 10701/15781Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

  2. Discriminative Classifier • Learn P(Y|X) directly • Logistic regression for binary classification: Note: Generative classifier: learn P(X|Y), P(Y) to get P(Y|X) under some modeling assumption e.g. P(X|Y) ~ N(my, 1), etc.

  3. Decision Boundary • For which X, P(Y=1|X,w) ≥P(Y=0|X,w)? • Decision boundary from NB? Linear classification rule!

  4. LR more generally • In more general case where for k < K for k=K

  5. How to learn P(Y|X) • Logistic regression • Maximize conditional log likelihood • Good news: concave function of w • Bad news: no closed form solution  gradient ascent

  6. Gradient ascent (/descent) • General framework for finding a maximum (or minimum) of a continuous (differentiable) function, say f(w) • Start with some initial value w(1) and compute the gradient vector • The next value w(2)is obtained by moving some distance from w(1) in the direction of steepest ascent, i.e., along the negative of the gradient

  7. Gradient ascent for LR Iterate until change < threshold For all i,

  8. Regularization • Overfitting is a problem, especially when data is very high dimensional and training data is sparse • Regularization: use a “penalized log likelihood function” which penalizes large values of w • the modified gradient ascent

  9. Applet http://www.cs.technion.ac.il/~rani/LocBoost/

  10. NB vs LR • Consider Y boolean, X continuous, X=(X1,…,Xn) • Number of parameters • NB: • LR: • Parameter estimation method • NB: uncoupled • LR: coupled

  11. NB vs LR • Asymptotic comparison (#training examples->infinity) • When model assumptions correct • NB,LR produce identical classifiers • When model assumptions incorrect • LR is less biased-does not assume conditional independence • therefore expected to outperform NB

More Related