240 likes | 253 Views
This lecture introduces logistic regression, its computation, and the Bernoulli model. It covers topics such as conditional probability, reduction to a harder problem, logit transform, prediction with confidence, and maximum likelihood estimation.
E N D
CS480/680: IntrotoML Lecture 03: LogisticRegression Yao-Liang Yu
Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu
Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu
Announcements • Assignment1due in two weeks • More questions on Piazza please • 2% bonus for top 10 Yao-Liang Yu
Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu
Classification revisited • ŷ = sign( xTw + b ) • How confident we are about ŷ? • |xTw + b| seems a good indicator • real-valued; hard to interpret • ways to transform into [0,1] • Better(?) idea: learn confidence directly Yao-Liang Yu
Conditional probability • P(Y=1 | X=x): conditional on seeing x, what is the chance of this instance being positive, i.e., Y=1? • obviously, value in [0,1] • P(Y=0 | X=x) = 1 – P(Y=1 | X=x), if two classes • more generally, sum to 1 Notation (Simplex). Δc-1:= { p in Rc: p ≥ 0, Σkpk = 1 } Yao-Liang Yu
Reduction to a harder problem • P(Y=1 | X=x) = E(1Y=1 | X=x) • Let Z = 1Y=1, then regression function for (X, Z) • use linear regression for binary Z? • Exploit structure! • conditional probabilities are in a simplex • Never reduce to unnecessarily harder problem Yao-Liang Yu
Bernoulli model • Let P(Y=1 | X=x) = p(x; w), parameterized by w • Conditional likelihood on {(x1, y1), …(xn, yn)}: • simplifies if independence holds • Assuming yi is {0,1}-valued Yao-Liang Yu
Naïve solution • Find w to maximize conditional likelihood • What is the solution if p(x; w) does not depend on x? • What is the solution if p(x; w) does not depend on w? Yao-Liang Yu
Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu
Logit transform • p(x; w) = wTx? p >=0 not guaranteed… • log p(x; w) = wTx? better! • LHS negative, RHS real-valued… • Logit transform • Or equivalently odds ratio Yao-Liang Yu
Prediction with confidence • ŷ = 1 if p = P(Y=1 | X=x) > ½ iffwTx> 0 • Decision boundary wTx = 0 • ŷ = sign(wTx) as before, but with confidence p(x; w) Yao-Liang Yu
Not just a classification algorithm • Logistic regression does more than classification • it estimates conditional probabilities • under the logit transform assumption • Having confidence in prediction is nice • the price is an assumption that may or may not hold • If classification is the sole goal, then doing extra work • as shall see, SVM only estimates decision boundary Yao-Liang Yu
More than logistic regression • F(p) transforms p from [0,1] to R • Then, equating F(p) to a linear function wTx • But, there are many other choices for F! • precisely the inverse of any distribution function! Yao-Liang Yu
Logistic distribution • Cumulative Distribution Function • Mean mu, variance s2π2/3 • Sigmoid: mu=0, s = 1 Yao-Liang Yu
Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu
Maximum likelihood • Minimize negative log-likelihood objective function f Yao-Liang Yu
Newton’s algorithm • η = 1: iterative weighted least-squares PSD Uncertain predictions get bigger weight Yao-Liang Yu
Comparison Yao-Liang Yu
A word about implementation • Numerically computing exponential can be tricky • easily underflows or overflows • The usual trick • estimate the range of the exponents • shift the mean of the exponents to 0 Yao-Liang Yu
More than 2 classes • Softmax • Again, nonnegative and sum to 1 • Negative log-likelihood (y is one-hot) Yao-Liang Yu
Questions? Yao-Liang Yu
Generalized linear models (GLM) • y ~ Bernoulli(p); p = p(x; w) natural parameter • Logistic regression • y ~ Normal(μ, σ2); μ = μ(x; w) • (weighted) least-squares regression • GLM: y ~ exp( θφ(y) – A(θ) ) log-partition function sufficient statistics Yao-Liang Yu