Logistic Regression

Logistic Regression 10701/15781Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

Discriminative Classifier • Learn P(Y|X) directly • Logistic regression for binary classification: Note: Generative classifier: learn P(X|Y), P(Y) to get P(Y|X) under some modeling assumption e.g. P(X|Y) ~ N(my, 1), etc.

Decision Boundary • For which X, P(Y=1|X,w) ≥P(Y=0|X,w)? • Decision boundary from NB? Linear classification rule!

LR more generally • In more general case where for k < K for k=K

How to learn P(Y|X) • Logistic regression • Maximize conditional log likelihood • Good news: concave function of w • Bad news: no closed form solution  gradient ascent

Gradient ascent (/descent) • General framework for finding a maximum (or minimum) of a continuous (differentiable) function, say f(w) • Start with some initial value w(1) and compute the gradient vector • The next value w(2)is obtained by moving some distance from w(1) in the direction of steepest ascent, i.e., along the negative of the gradient

Gradient ascent for LR Iterate until change < threshold For all i,

Regularization • Overfitting is a problem, especially when data is very high dimensional and training data is sparse • Regularization: use a “penalized log likelihood function” which penalizes large values of w • the modified gradient ascent

Applet http://www.cs.technion.ac.il/~rani/LocBoost/

NB vs LR • Consider Y boolean, X continuous, X=(X1,…,Xn) • Number of parameters • NB: • LR: • Parameter estimation method • NB: uncoupled • LR: coupled

NB vs LR • Asymptotic comparison (#training examples->infinity) • When model assumptions correct • NB,LR produce identical classifiers • When model assumptions incorrect • LR is less biased-does not assume conditional independence • therefore expected to outperform NB

Logistic Regression