110 likes | 268 Views
Logistic Regression. 10701 /15781 Recitation February 5, 2008. Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Discriminative Classifier. Learn P(Y|X) directly Logistic regression for binary classification:
E N D
Logistic Regression 10701/15781Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.
Discriminative Classifier • Learn P(Y|X) directly • Logistic regression for binary classification: Note: Generative classifier: learn P(X|Y), P(Y) to get P(Y|X) under some modeling assumption e.g. P(X|Y) ~ N(my, 1), etc.
Decision Boundary • For which X, P(Y=1|X,w) ≥P(Y=0|X,w)? • Decision boundary from NB? Linear classification rule!
LR more generally • In more general case where for k < K for k=K
How to learn P(Y|X) • Logistic regression • Maximize conditional log likelihood • Good news: concave function of w • Bad news: no closed form solution gradient ascent
Gradient ascent (/descent) • General framework for finding a maximum (or minimum) of a continuous (differentiable) function, say f(w) • Start with some initial value w(1) and compute the gradient vector • The next value w(2)is obtained by moving some distance from w(1) in the direction of steepest ascent, i.e., along the negative of the gradient
Gradient ascent for LR Iterate until change < threshold For all i,
Regularization • Overfitting is a problem, especially when data is very high dimensional and training data is sparse • Regularization: use a “penalized log likelihood function” which penalizes large values of w • the modified gradient ascent
Applet http://www.cs.technion.ac.il/~rani/LocBoost/
NB vs LR • Consider Y boolean, X continuous, X=(X1,…,Xn) • Number of parameters • NB: • LR: • Parameter estimation method • NB: uncoupled • LR: coupled
NB vs LR • Asymptotic comparison (#training examples->infinity) • When model assumptions correct • NB,LR produce identical classifiers • When model assumptions incorrect • LR is less biased-does not assume conditional independence • therefore expected to outperform NB