Logistic Regression: Classification with Confidence

CS480/680: IntrotoML Lecture 04: LogisticRegression Yao-Liang Yu

Outline • Announcements • Bernoulli model • Logistic regression • Computation Yao-Liang Yu

Announcements • Assignment1due next week Yao-Liang Yu

Classification revisited • ŷ = sign( xTw + b ) • How confident we are about ŷ? • |xTw + b| seems a good indicator • real-valued; hard to interpret • ways to transform into [0,1] • Better(?) idea: learn confidence directly Yao-Liang Yu

Conditional probability • P(Y=1 | X=x): conditional on seeing x, what is the chance of this instance being positive, i.e., Y=1? • obviously, value in [0,1] • P(Y=0 | X=x) = 1 – P(Y=1 | X=x), if two classes • more generally, sum to 1 Notation (Simplex). Δc-1:= { p in Rc: p ≥ 0, Σkpk = 1 } Yao-Liang Yu

Reduction to a harder problem • P(Y=1 | X=x) = E(1Y=1 | X=x) • Let Z = 1Y=1, then regression function for (X, Z) • use linear regression for binary Z? • Exploit structure! • conditional probabilities are in a simplex • Never reduce to unnecessarily harder problem Yao-Liang Yu

Bernoulli model • Let P(Y=1 | X=x) = p(x; w), parameterized by w • Conditional likelihood on {(x1, y1), …(xn, yn)}: • simplifies if independence holds • Assuming yi is {0,1}-valued Yao-Liang Yu

Naïve solution • Find w to maximize conditional likelihood • What is the solution if p(x; w) does not depend on x? • What is the solution if p(x; w) does not depend on w? Yao-Liang Yu

Generalized linear models (GLM) • y ~ Bernoulli(p); p = p(x; w) natural parameter • Logistic regression • y ~ Normal(μ, σ2); μ = μ(x; w) • (weighted) least-squares regression • GLM: y ~ exp( θφ(y) – A(θ) ) log-partition function sufficient statistics Yao-Liang Yu

Logit transform • p(x; w) = wTx? p >=0 not guaranteed… • log p(x; w) = wTx? better! • LHS negative, RHS real-valued… • Logit transform • Or equivalently odds ratio Yao-Liang Yu

Prediction with confidence • ŷ = 1 if p = P(Y=1 | X=x) > ½ iffwTx> 0 • Decision boundary wTx = 0 • ŷ = sign(wTx) as before, but with confidence p(x; w) Yao-Liang Yu

Not just a classification algorithm • Logistic regression does more than classification • it estimates conditional probabilities • under the logit transform assumption • Having confidence in prediction is nice • the price is an assumption that may or may not hold • If classification is the sole goal, then doing extra work • as shall see, SVM only estimates decision boundary Yao-Liang Yu

More than logistic regression • F(p) transforms p from [0,1] to R • Then, equating F(p) to a linear function wTx • But, there are many other choices for F! • precisely the inverse of any distribution function! Yao-Liang Yu

Logistic distribution • Cumulative Distribution Function • Mean mu, variance s2π2/3 Yao-Liang Yu

Maximum likelihood • Minimize negative log-likelihood Yao-Liang Yu

Newton’s algorithm • η = 1: iterative weighted least-squares PSD Uncertain predictions get bigger weight Yao-Liang Yu

Comparison Yao-Liang Yu

A word about implementation • Numerically computing exponential can be tricky • easily underflows or overflows • The usual trick • estimate the range of the exponents • shift the mean of the exponents to 0 Yao-Liang Yu

More than 2 classes • Softmax • Again, nonnegative and sum to 1 • Negative log-likelihood (y is one-hot) Yao-Liang Yu

Questions? Yao-Liang Yu

Logistic Regression: Classification with Confidence

Logistic Regression: Classification with Confidence

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7