Section 3

Section 3 Probit and Logit Models

Dichotomous Data • Suppose data is discrete but there are only 2 outcomes • Examples • Graduate high school or not • Patient dies or not • Working or not • Smoker or not • In data, yi=1 if yes, yi =0 if no

How to model the data generating process? • There are only two outcomes • Research question: What factors impact whether the event occurs? • To answer, will model the probability the outcome occurs • Pr(Yi=1) when yi=1 or • Pr(Yi=0) = 1- Pr(Yi=1) when yi=0

Think of the problem from a MLE perspective • Likelihood for i’th observation • Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi) • When yi=1, only relevant part is Pr(Yi=1) • When yi=0, only relevant part is [1 - Pr(Yi=1)]

L = Σi ln[Li] = = Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] } • Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(yi=1)

Modeling the probability • There is some process (biological, social, decision theoretic, etc) that determines the outcome y • Some of the variables impacting are observed, some are not • Requires that we model how these factors impact the probabilities • Model from a ‘latent variable’ perspective

Consider a women’s decision to work • yi* = the person’s net benefit to work • Two components of yi* • Characteristics that we can measure • Education, age, income of spouse, prices of child care • Some we cannot measure • How much you like spending time with your kids • how much you like/hate your job

We aggregate these two components into one equation • yi* = β0 + x1iβ1+ x2iβ2+… xkiβk+ εi = xi β + εi • xi β (measurable characteristics but with uncertain weights) • εi random unmeasured characteristics • Decision rule: person will work if yi* > 0 (if net benefits are positive) yi=1 if yi*>0 yi=0 if yi*≤0

yi=1 if yi*>0 • yi* = xi β + εi > 0 only if • εi > - xi β • yi=0 if yi*≤0 • yi* = xi β + εi ≤ 0 only if • εi ≤ - xi β

Suppose xi β is ‘big.’ • High wages • Low husband’s income • Low cost of child care • We would expect this person to work, UNLESS, there is some unmeasured ‘variable’ that counteracts this

Suppose a mom really likes spending time with her kids, or she hates her job. • The unmeasured benefit of working has a big negative coefficient εi • If we observe them working, εi must not have been too big, since • yi=1 if εi > - xi β

Consider the opposite. Suppose we observe someone NOT working. • Then εi must not have been big, since • yi=0 if εi ≤ - xi β

Logit • Recall yi =1 if εi > - xi β • Since εi is a logistic distribution • Pr(εi > - xi β) = 1 – F(- xi β) • The logistic is also a symmetric distribution, so • 1 – F(- xi β) • = F(xi β) • = exp(xi β)/(1+exp(xi β))

When εi is a logistic distribution • Pr(yi =1) = exp(xi β)/(1+exp(xi β)) • Pr(yi=0) = 1/(1+exp(xi β))

Example: Workplace smoking bans • Smoking supplements to 1991 and 1993 National Health Interview Survey • Asked all respondents whether they currently smoke • Asked workers about workplace tobacco policies • Sample: workers • Key variables: current smoking and whether they faced by workplace ban

Data: workplace1.dta • Sample program: workplace1.doc • Results: workplace1.log

Description of variables in data • . desc; • storage display value • variable name type format label variable label • ------------------------------------------------------------------------ • > - • smoker byte %9.0g is current smoking • worka byte %9.0g has workplace smoking bans • age byte %9.0g age in years • male byte %9.0g male • black byte %9.0g black • hispanic byte %9.0g hispanic • incomel float %9.0g log income • hsgrad byte %9.0g is hs graduate • somecol byte %9.0g has some college • college float %9.0g • -----------------------------------------------------------------------

Summary statistics • sum; • Variable | Obs Mean Std. Dev. Min Max • -------------+-------------------------------------------------------- • smoker | 16258 .25163 .433963 0 1 • worka | 16258 .6851396 .4644745 0 1 • age | 16258 38.54742 11.96189 18 87 • male | 16258 .3947595 .488814 0 1 • black | 16258 .1119449 .3153083 0 1 • -------------+-------------------------------------------------------- • hispanic | 16258 .0607086 .2388023 0 1 • incomel | 16258 10.42097 .7624525 6.214608 11.22524 • hsgrad | 16258 .3355271 .4721889 0 1 • somecol | 16258 .2685447 .4432161 0 1 • college | 16258 .3293763 .4700012 0 1

Running a probit • probit smoker age incomel male black hispanic hsgrad somecol college worka; • The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables • Includes a constant as a default

Running a logit • logit smoker age incomel male black hispanic hsgrad somecol college worka; • Same as probit, just change the first word

Running linear probability • reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; • Simple regression. • Standard errors are incorrect (heteroskedasticity) • robust option produces standard errors with arbitrary form of heteroskedasticity

Probit Results • Probit estimates Number of obs = 16258 • LR chi2(9) = 819.44 • Prob > chi2 = 0.0000 • Log likelihood = -8761.7208 Pseudo R2 = 0.0447 • ------------------------------------------------------------------------------ • smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574 • incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193 • male | .0533213 .0229297 2.33 0.020 .0083799 .0982627 • black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137 • hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235 • hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453 • somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262 • college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366 • worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702 • _cons | .870543 .154056 5.65 0.000 .5685989 1.172487 • ------------------------------------------------------------------------------

How to measure fit? • Regression (OLS) • minimize sum of squared errors • Or, maximize R2 • The model is designed to maximize predictive capacity • Not the case with Probit/Logit • MLE models pick distribution parameters so as best describe the data generating process • May or may not ‘predict’ the outcome well

Pseudo R2 • LLk log likelihood with all variables • LL1 log likelihood with only a constant • 0 > LLk > LL1 so | LLk | < |LL1| • Pseudo R2 = 1 - |LL1/LLk| • Bounded between 0-1 • Not anything like an R2 from a regression

Predicting Y • Let b be the estimated value of β • For any candidate vector of xi , we can predict probabilities, Pi • Pi = Ф(xib) • Once you have Pi, pick a threshold value, T, so that you predict • Yp = 1 if Pi > T • Yp = 0 if Pi ≤ T • Then compare, fraction correctly predicted

Question: what value to pick for T? • Can pick .5 • Intuitive. More likely to engage in the activity than to not engage in it • However, when the  is small, this criteria does a poor job of predicting Yi=1 • However, when the  is close to 1, this criteria does a poor job of picking Yi=0

*predict probability of smoking; • predict pred_prob_smoke; • * get detailed descriptive data about predicted prob; • sum pred_prob, detail; • * predict binary outcome with 50% cutoff; • gen pred_smoke1=pred_prob_smoke>=.5; • label variable pred_smoke1 "predicted smoking, 50% cutoff"; • * compare actual values; • tab smoker pred_smoke1, row col cell;

. sum pred_prob, detail; • Pr(smoker) • ------------------------------------------------------------- • Percentiles Smallest • 1% .0959301 .0615221 • 5% .1155022 .0622963 • 10% .1237434 .0633929 Obs 16258 • 25% .1620851 .0733495 Sum of Wgt. 16258 • 50% .2569962 Mean .2516653 • Largest Std. Dev. .0960007 • 75% .3187975 .5619798 • 90% .3795704 .5655878 Variance .0092161 • 95% .4039573 .5684112 Skewness .1520254 • 99% .4672697 .6203823 Kurtosis 2.149247

Notice two things • Sample mean of the predicted probabilities is close to the sample mean outcome • 99% of the probabilities are less than .5 • Should predict few smokers if use a 50% cutoff

| predicted smoking, • is current | 50% cutoff • smoking | 0 1 | Total • -----------+----------------------+---------- • 0 | 12,153 14 | 12,167 • | 99.88 0.12 | 100.00 • | 74.93 35.90 | 74.84 • | 74.75 0.09 | 74.84 • -----------+----------------------+---------- • 1 | 4,066 25 | 4,091 • | 99.39 0.61 | 100.00 • | 25.07 64.10 | 25.16 • | 25.01 0.15 | 25.16 • -----------+----------------------+---------- • Total | 16,219 39 | 16,258 • | 99.76 0.24 | 100.00 • | 100.00 100.00 | 100.00 • | 99.76 0.24 | 100.00

Check on-diagonal elements. • The last number in each 2x2 element is the fraction in the cell • The model correctly predicts 74.75 + 0.15 = 74.90% of the obs • It only predicts a small fraction of smokers

Do not be amazed by the 75% percent correct prediction • If you said everyone has a  chance of smoking (a case of no covariates), you would be correct Max[(,(1-)] percent of the time

In this case, 25.16% smoke. • If everyone had the same chance of smoking, we would assign everyone Pr(y=1) = .2516 • We would be correct for the 1 - .2516 = 0.7484 people who do not smoke

Key points about prediction • MLE models are not designed to maximize prediction • Should not be surprised they do not predict well • In this case, not particularly good measures of predictive capacity

Translating coefficients in probit:Continuous Covariates • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk] • Suppose that x1i is a continuous variable • d Pr(yi=1) /d x1i = ? • What is the change in the probability of an event give a change in x1i?

Marginal Effect • d Pr(yi=1) /d x1i • = β1φ[β0 + x1iβ1+ x2iβ2+… xkiβk] • Notice two things. Marginal effect is a function of the other parameters and the values of x.

Translating Coefficients:Discrete Covariates • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk] • Suppose that x2i is a dummy variable (1 if yes, 0 if no) • Marginal effect makes no sense, cannot change x2i by a little amount. It is either 1 or 0. • Redefine the variable of interest. Compare outcomes with and without x2i

y1 = Pr(yi=1 | x2i=1) = Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ] • y0 = Pr(yi=1 | x2i=0) = Φ[β0 + x1iβ1+ x3iβ3 … ] Marginal effect = y1 – y0. Difference in probabilities with and without x2i?

In STATA • Marginal effects for continuous variables, STATA picks sample means for X’s • Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s

STATA command for Marginal Effects • mfx compute; • Must be after the outcome when estimates are still active in program.

Marginal effects after probit • y = Pr(smoker) (predict) • = .24093439 • ------------------------------------------------------------------------------ • variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X • ---------+-------------------------------------------------------------------- • age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474 • incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421 • male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476 • black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945 • hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709 • hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527 • somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545 • college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376 • worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514 • ------------------------------------------------------------------------------ • (*) dy/dx is for discrete change of dummy variable from 0 to 1

Interpret results • 10% increase in income will reduce smoking by 2.9 percentage points • 10 year increase in age will decrease smoking rates .4 percentage points • Those with a college degree are 21.5 percentage points less likely to smoke • Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking

Do not confuse percentage point and percent differences • A 6.7 percentage point drop is 29% of the sample mean of 24 percent. • Blacks have smoking rates that are 3.2 percentage points lower than others, which is 13 percent of the sample mean

Comparing Marginal Effects

When will results differ? • Normal and logit CDF look • Similar in the mid point of the distribution • Different in the tails • You obtain more observations in the tails of the distribution when • Samples sizes are large •  approaches 1 or 0 • These situations will produce more differences in estimates

Some nice properties of the Logit • Outcome, y=1 or 0 • Treatment, x=1 or 0 • Other covariates, x • Context, • x = whether a baby is born with a low weight birth • x = whether the mom smoked or not during pregnancy

Risk ratio RR = Prob(y=1|x=1)/Prob(y=1|x=0) Differences in the probability of an event when x is and is not observed How much does smoking elevate the chance your child will be a low weight birth

Let Yyx be the probability y=1 or 0 given x=1 or 0 • Think of the risk ratio the following way • Y11 is the probability Y=1 when X=1 • Y10 is the probability Y=1 when X=0 • Y11 = RR*Y10

Odds Ratio OR=A/B = [Y11/Y01]/[Y10/Y00] A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)] = odds of Y occurring if you are a smoker B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)] = odds of y happening if you are not a smoker What are the relative odds of Y happening if you do or do not experience X

Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and F is the logistic function • Can show that • OR = exp(β1) = e β1 • This number is typically reported by most statistical packages

Section 3

Section 3

Presentation Transcript

Section 3.

Section 3-3

Section 3

Section 3

Section 3

Section 3

Section ‘3’

Section 3

Section 3

SECTION 3

SECTION 3

Section 3-3

Section 3

Section 3

Section 3

Section 3

Section 3-3

Section 3

SECTION 3

Section 3

Section 3

Section 3