1 / 24

Logistic Regression: Classification with Confidence

Learn about logistic regression and how it can be used for classification with confidence. Topics include the Bernoulli model, computation, conditional probability, reduction to a harder problem, and more.

frankrosado
Download Presentation

Logistic Regression: Classification with Confidence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS480/680: IntrotoML Lecture 04: LogisticRegression Yao-Liang Yu

  2. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  3. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  4. Announcements • Assignment1due next week Yao-Liang Yu

  5. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  6. Classification revisited • ŷ = sign( xTw + b ) • How confident we are about ŷ? • |xTw + b| seems a good indicator • real-valued; hard to interpret • ways to transform into [0,1] • Better(?) idea: learn confidence directly Yao-Liang Yu

  7. Conditional probability • P(Y=1 | X=x): conditional on seeing x, what is the chance of this instance being positive, i.e., Y=1? • obviously, value in [0,1] • P(Y=0 | X=x) = 1 – P(Y=1 | X=x), if two classes • more generally, sum to 1 Notation (Simplex). Δc-1:= { p in Rc: p ≥ 0, Σkpk = 1 } Yao-Liang Yu

  8. Reduction to a harder problem • P(Y=1 | X=x) = E(1Y=1 | X=x) • Let Z = 1Y=1, then regression function for (X, Z) • use linear regression for binary Z? • Exploit structure! • conditional probabilities are in a simplex • Never reduce to unnecessarily harder problem Yao-Liang Yu

  9. Bernoulli model • Let P(Y=1 | X=x) = p(x; w), parameterized by w • Conditional likelihood on {(x1, y1), …(xn, yn)}: • simplifies if independence holds • Assuming yi is {0,1}-valued Yao-Liang Yu

  10. Naïve solution • Find w to maximize conditional likelihood • What is the solution if p(x; w) does not depend on x? • What is the solution if p(x; w) does not depend on w? Yao-Liang Yu

  11. Generalized linear models (GLM) • y ~ Bernoulli(p); p = p(x; w) natural parameter • Logistic regression • y ~ Normal(μ, σ2); μ = μ(x; w) • (weighted) least-squares regression • GLM: y ~ exp( θφ(y) – A(θ) ) log-partition function sufficient statistics Yao-Liang Yu

  12. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  13. Logit transform • p(x; w) = wTx? p >=0 not guaranteed… • log p(x; w) = wTx? better! • LHS negative, RHS real-valued… • Logit transform • Or equivalently odds ratio Yao-Liang Yu

  14. Prediction with confidence • ŷ = 1 if p = P(Y=1 | X=x) > ½ iffwTx> 0 • Decision boundary wTx = 0 • ŷ = sign(wTx) as before, but with confidence p(x; w) Yao-Liang Yu

  15. Not just a classification algorithm • Logistic regression does more than classification • it estimates conditional probabilities • under the logit transform assumption • Having confidence in prediction is nice • the price is an assumption that may or may not hold • If classification is the sole goal, then doing extra work • as shall see, SVM only estimates decision boundary Yao-Liang Yu

  16. More than logistic regression • F(p) transforms p from [0,1] to R • Then, equating F(p) to a linear function wTx • But, there are many other choices for F! • precisely the inverse of any distribution function! Yao-Liang Yu

  17. Logistic distribution • Cumulative Distribution Function • Mean mu, variance s2π2/3 Yao-Liang Yu

  18. Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

  19. Maximum likelihood • Minimize negative log-likelihood Yao-Liang Yu

  20. Newton’s algorithm • η = 1: iterative weighted least-squares PSD Uncertain predictions get bigger weight Yao-Liang Yu

  21. Comparison Yao-Liang Yu

  22. A word about implementation • Numerically computing exponential can be tricky • easily underflows or overflows • The usual trick • estimate the range of the exponents • shift the mean of the exponents to 0 Yao-Liang Yu

  23. More than 2 classes • Softmax • Again, nonnegative and sum to 1 • Negative log-likelihood (y is one-hot) Yao-Liang Yu

  24. Questions? Yao-Liang Yu

More Related