Estimating Parameters with Naïve Bayes Classifier

Logistic Regression Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019

Administrative • Please start HW 1 early! • Questions are welcome!

Two principles for estimating parameters • Maximum Likelihood Estimate (MLE) Choose that maximizes probability of observed data • Maximum a posteriori estimation (MAP)Choose that is most probable given prior probability and data Slide credit: Tom Mitchell

Naïve Bayes classifier • Want to learn • But require parameters... • How about applying Bayes rule? • : Need parameters • :Need 1 parameter • Apply conditional independence assumption • : Need parameters

Naïve Bayes classifier • Bayes rule: • Assume conditional independence among ’s: • Pick the most probable Y Slide credit: Tom Mitchell

Example • Estimating parameters • Test example: • : • : Conditional indep. Bayes rule 0.6

Naïve Bayes algorithm – discrete • For each value Estimate For each value of each attribute Estimate • Classify Slide credit: Tom Mitchell

Estimating parameters: discrete • Maximum likelihood estimates (MLE) Slide credit: Tom Mitchell

F = 1 iff you live in Fox Ridge • S = 1 iff you watched the superbowl last night • D = 1 iff you drive to VT • G = 1 iff you went to gym in the last month

Naïve Bayes: Subtlety #1 • Often the are not really conditionally independent • Naïve Bayes often works pretty well anyway • Often the right classification, even when not the right probability [Domingos & Pazzani, 1996]) • What is the effect on estimated ? • What if we have two copies: Slide credit: Tom Mitchell

Naïve Bayes: Subtlety #2 MLE estimate for might be zero. (for example, = birthdate. = Feb_4_1995) • Why worry about just one parameter out of many? • What can we do to address this? • MAP estimates (adding “imaginary” examples) Slide credit: Tom Mitchell

Estimating parameters: discrete • Maximum likelihood estimates (MLE) • MAP estimates (Dirichlet priors): Slide credit: Tom Mitchell

What if we have continuous • Gaussian Naïve Bayes (GNB): assume • Additional assumption on : • Is independent of () • Is independent of () • Is independent of and () Slide credit: Tom Mitchell

Naïve Bayes algorithm – continuous • For each value Estimate For each attribute estimate Class conditional mean , variance • Classify Slide credit: Tom Mitchell

Things to remember • Probability basics • Conditional probability, joint probability, Bayes rule • Estimating parameters from data • Maximum likelihood (ML) maximize • Maximum a posteriori estimation (MAP) maximize • Naive Bayes

Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

1 (Yes) • Threshold classifier output at 0.5 • If predict “” • If , predict “” Malignant? 0 (No) Tumor Size Slide credit: Andrew Ng

Classification: or (from linear regression) can be or Logistic regression: Logistic regression is actually for classification Slide credit: Andrew Ng

Hypothesis representation • Want • where • Sigmoid function • Logistic function Slide credit: Andrew Ng

Interpretation of hypothesis output • estimated probability that on input • Example: If • 0.7 • Tell patient that 70% chance of tumor being malignant Slide credit: Andrew Ng

Logistic regression Suppose predict “y = 1” if predict “y = 0” if Slide credit: Andrew Ng

Decision boundary E.g., • Predict “” if Age Tumor Size Slide credit: Andrew Ng

E.g., • Predict “” if Slide credit: Andrew Ng

Where does the form come from? • Logistic regression hypothesis representation • Consider learning f: , where • is a vector of real-valued features • is Boolean • Assume all are conditionally independent given • Model as Gaussian • Model as Bernoulli What is ? Slide credit: Tom Mitchell

Applying Bayes rule Divide by Apply Plug in Slide credit: Tom Mitchell

Training set with examples , How to choose parameters ? Slide credit: Andrew Ng

Cost function for Linear Regression Slide credit: Andrew Ng

Cost function for Logistic Regression Slide credit: Andrew Ng

Logistic regression cost function • : • : Slide credit: Andrew Ng

Logistic regression Learning: fit parameter Prediction: given new Output Slide credit: Andrew Ng

Where does the costcome from? • Training set with examples • Maximum likelihood estimate for parameter • Maximum conditional likelihood estimate for parameter Slide credit: Tom Mitchell

Goal: choose to maximize conditional likelihood of training data • Training data • Data likelihood • Data conditional likelihood Slide credit: Tom Mitchell

Expressing conditional log-likelihood )

Gradient descent Goal: Repeat { } Good news: Convex function! Bad news: No analytical solution (Simultaneously update all ) Slide credit: Andrew Ng

Gradient descent Goal: Repeat { } (Simultaneously update all ) Slide credit: Andrew Ng

Gradient descent for Linear Regression Repeat { } Gradient descent for Logistic Regression Repeat { } Slide credit: Andrew Ng

How about MAP? • Maximum conditional likelihood estimate (MCLE) • Maximum conditional a posterior estimate (MCAP)

Prior • Common choice of : • Normal distribution, zero mean, identity covariance • “Pushes” parameters towards zeros • Corresponds to Regularization • Helps avoid very large weights and overfitting Slide credit: Tom Mitchell

MLE vs. MAP • Maximum conditional likelihood estimate (MCLE) • Maximum conditional a posterior estimate (MCAP)

Multi-class classification • Email foldering/taggning: Work, Friends, Family, Hobby • Medical diagrams: Not ill, Cold, Flu • Weather: Sunny, Cloudy, Rain, Snow Slide credit: Andrew Ng

Binary classification Multiclass classification

One-vs-all (one-vs-rest) Class 1: Class 2: Class 3: Slide credit: Andrew Ng

One-vs-all • Train a logistic regression classifier for each class to predict the probability that • Given a new input , pick the class that maximizes Slide credit: Andrew Ng

Discriminative Approach Ex: Logistic regression Estimate directly (Or a discriminant function: e.g., SVM) Prediction Generative Approach Ex: Naïve Bayes Estimate and Prediction

Estimating Parameters with Naïve Bayes Classifier

Estimating Parameters with Naïve Bayes Classifier

Presentation Transcript

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression