1 / 24

Logistic Regression: Computation and Bernoulli Model

This lecture introduces logistic regression, its computation, and the Bernoulli model. It covers topics such as conditional probability, reduction to a harder problem, logit transform, prediction with confidence, and maximum likelihood estimation.

scorrea
Download Presentation

Logistic Regression: Computation and Bernoulli Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS480/680: IntrotoML Lecture 03: LogisticRegression Yao-Liang Yu

  2. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  3. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  4. Announcements • Assignment1due in two weeks • More questions on Piazza please • 2% bonus for top 10 Yao-Liang Yu

  5. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  6. Classification revisited • ŷ = sign( xTw + b ) • How confident we are about ŷ? • |xTw + b| seems a good indicator • real-valued; hard to interpret • ways to transform into [0,1] • Better(?) idea: learn confidence directly Yao-Liang Yu

  7. Conditional probability • P(Y=1 | X=x): conditional on seeing x, what is the chance of this instance being positive, i.e., Y=1? • obviously, value in [0,1] • P(Y=0 | X=x) = 1 – P(Y=1 | X=x), if two classes • more generally, sum to 1 Notation (Simplex). Δc-1:= { p in Rc: p ≥ 0, Σkpk = 1 } Yao-Liang Yu

  8. Reduction to a harder problem • P(Y=1 | X=x) = E(1Y=1 | X=x) • Let Z = 1Y=1, then regression function for (X, Z) • use linear regression for binary Z? • Exploit structure! • conditional probabilities are in a simplex • Never reduce to unnecessarily harder problem Yao-Liang Yu

  9. Bernoulli model • Let P(Y=1 | X=x) = p(x; w), parameterized by w • Conditional likelihood on {(x1, y1), …(xn, yn)}: • simplifies if independence holds • Assuming yi is {0,1}-valued Yao-Liang Yu

  10. Naïve solution • Find w to maximize conditional likelihood • What is the solution if p(x; w) does not depend on x? • What is the solution if p(x; w) does not depend on w? Yao-Liang Yu

  11. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  12. Logit transform • p(x; w) = wTx? p >=0 not guaranteed… • log p(x; w) = wTx? better! • LHS negative, RHS real-valued… • Logit transform • Or equivalently odds ratio Yao-Liang Yu

  13. Prediction with confidence • ŷ = 1 if p = P(Y=1 | X=x) > ½ iffwTx> 0 • Decision boundary wTx = 0 • ŷ = sign(wTx) as before, but with confidence p(x; w) Yao-Liang Yu

  14. Not just a classification algorithm • Logistic regression does more than classification • it estimates conditional probabilities • under the logit transform assumption • Having confidence in prediction is nice • the price is an assumption that may or may not hold • If classification is the sole goal, then doing extra work • as shall see, SVM only estimates decision boundary Yao-Liang Yu

  15. More than logistic regression • F(p) transforms p from [0,1] to R • Then, equating F(p) to a linear function wTx • But, there are many other choices for F! • precisely the inverse of any distribution function! Yao-Liang Yu

  16. Logistic distribution • Cumulative Distribution Function • Mean mu, variance s2π2/3 • Sigmoid: mu=0, s = 1 Yao-Liang Yu

  17. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  18. Maximum likelihood • Minimize negative log-likelihood objective function f Yao-Liang Yu

  19. Newton’s algorithm • η = 1: iterative weighted least-squares PSD Uncertain predictions get bigger weight Yao-Liang Yu

  20. Comparison Yao-Liang Yu

  21. A word about implementation • Numerically computing exponential can be tricky • easily underflows or overflows • The usual trick • estimate the range of the exponents • shift the mean of the exponents to 0 Yao-Liang Yu

  22. More than 2 classes • Softmax • Again, nonnegative and sum to 1 • Negative log-likelihood (y is one-hot) Yao-Liang Yu

  23. Questions? Yao-Liang Yu

  24. Generalized linear models (GLM) • y ~ Bernoulli(p); p = p(x; w) natural parameter • Logistic regression • y ~ Normal(μ, σ2); μ = μ(x; w) • (weighted) least-squares regression • GLM: y ~ exp( θφ(y) – A(θ) ) log-partition function sufficient statistics Yao-Liang Yu

More Related