720 likes | 979 Views
‘Interpreting results from statistical modelling – a seminar for social scientists’. Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th April 2008. ‘Interpreting results from statistical modelling – a seminar for social scientists’.
E N D
‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th April 2008
‘Interpreting results from statistical modelling – a seminar for social scientists’ • Our experience has shown that the results of statistical models can easily be misrepresented • In this seminar we demonstrate that the correct interpretation of results from statistical models often requires more detailed knowledge than is commonly appreciated • We illustrate some approaches to best practice in this area • This seminar is primarily aimed at quantitative social researchers working with micro-social survey data
Principles of model construction and interpretation yi=bo+ b1 X1+….+bkXk+ui Today we are interested in b • “What does b tell us?” • “Where’s the action?” • Going beyond “significance and sign”
Statistical Models The idea of generalized linear models (glm) brings together of wealth of disparate topics – thinking of these models under a general umbrella term aids interpretation Now I would say that generalized linear and mixed models (glmm) are the natural extension
Statistical Modelling Process Model formulation [make assumptions] Model fitting [quantify systematic relationships & random variation] (Model criticism) [review assumptions] Model interpretation [assess results] Davies and Dale, 1994 p.5
Building Models REMEMBER – Real data is much more messy, badly behaved (in real life people do odd stuff), models are harder to interpret etc. than the data used in books and at workshops
Building Models • Many of you are experienced data analysts (otherwise see our handout) • Always be guided by substantive theory (the economists are good at this – but a bit rigid) • Consider the “functional form” of the variables (especially the outcome) • Start with “main effects” – more complicated models later
Some Common Models • Continuous Y Linear Regression • Binary Y Logit; Probit • Categorical Y Multinomial Logit • Ordered Cat. Y Cont. Ratio; Cum Logit • Count Y Poisson • Repeated Binary Y Panel Probit (Logit)
I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression I must not use Stepwise Regression
A very very simple example • A fictitious data set based on a short steep Scottish hill race (record time 31 minutes; 5 miles and 1,200 feet of ascent) • A group of 73 male runners • Times 32 mins - 60 minutes mean 42.7; s.d. 8.32 • Heights 60 - 70 inches (5 ft to 6 ft) • Weights 140 lbs - 161 lbs (10 st to 11 st 7 lb) • Everyone finishes (i.e. no censored cases)
A (vanilla) Regression Simple Stata Output… regress time height weight --------------------------------------------------------------------------------------------------------- time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+------------------------------------------------------------------------------------------ height | 1.010251 .0813485 12.42 0.000 .8480067 1.172495 weight | .7369447 .0370876 19.87 0.000 .6629759 .8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 ---------------------------------------------------------------------------------------------------------
A (vanilla) Regression ------------------------------------------------------------------------------------------------- time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------------------------- height | 1.010251 .0813485 12.42 0.000 .8480067 1.172495 weight | .7369447 .0370876 19.87 0.000 .6629759 .8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 -------------------------------------------------------------------------------------------------- On average (ceteris paribus) a one unit change in weight (lbs) leads to an increase of .74 minutes on the runner’s time
A (vanilla) Regression ------------------------------------------------------------------------------ time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | 1.010251 .0813485 12.42 0.000 .8480067 1.172495 weight | .7369447 .0370876 19.87 0.000 .6629759 .8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 ------------------------------------------------------------------------------ On average (ceteris paribus) a one unit change in height (inches) leads to an increase of 1 minute on the runner’s time (remember this is a fell race being too tall does not necessarily help you)
A (vanilla) Regression ------------------------------------------------------------------------------ time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | 1.010251 .0813485 12.42 0.000 .8480067 1.172495 weight | .7369447 .0370876 19.87 0.000 .6629759 .8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 ------------------------------------------------------------------------------ This is the intercept b0 – In this model it is the time (on average) that a person who was 0 inches and 0 pounds would take?
A (vanilla) Regression ------------------------------------------------------------------------------ time | Coef. Std. Err. t P>|t| -------------+---------------------------------------------------------------- height0 | 1.010251 .0813485 12.42 0.000 weight0 | .7369447 .0370876 19.87 0.000 _cons | 32.22542 .5126303 62.86 0.000 A better parameterized model - height centred at 60 inches; weight centred at 140 lb This is the intercept b0 – In this model it is the time (on average) that a runner who is 60 inches and 140 pounds would take
A (vanilla) Regression regress time height0 weight0, beta time | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- height0 | 1.010251 .0813485 12.42 0.000 .4659946 weight0 | .7369447 .0370876 19.87 0.000 .7456028 _cons | 32.22542 .5126303 62.86 0.000 Standardized beta coefficients are reported instead of confidence intervals The beta coefficients are the regression coefficients obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1 Beta coefficients can be useful when comparing the effects of variables measured on different scales (i.e. in different units such as inches and pounds)
X Variable Measurement – e.g. Age • Linear units (e.g. months) • Resolution of measurement (are years better?) • Is a squared term appropriate (e.g. age may not be linear in employment models) • A banded variable (age bands – allow the direction of the effect to change; between 20-29 women’s employment behaviour might be different to 30-39)
Binary Outcomes • Logit model is popular in sociology, social geography, social policy, education etc • Probit model is more widely used in economics
Example Drew, D., Gray, J. and Sime, N. (1992) Against the odds: The Education and Labour Market Experiences of Black Young People
The Deviance is sometimes called G2 -2 * Log Likelihood It has a chi-squared distribution with associated degrees of freedom
The estimate. Also known as the ‘coefficient’ ‘log odds’ ‘parameter estimate’ Beta (b) Measured on the log scale
This the standard error of the estimate Measured on the log scale
This is the odds ratio. It is the exponential (i.e. the anti-log) of the estimate.
Comparison of Odds Greater than 1 “higher odds” Less than 1 “lower odds”
Naïve Odds • In this model (after controlling for other factors) White pupils have an odds of 1.0 Afro Caribbean pupils have an odds of 3.2 • Reporting this in isolation is a naïve presentation of the effect because it ignores other factors in the model
Pupil with 4+ higher passes White Professional parents Male Graduate parents Two parent family Pupil with 0 higher passes Afro-Caribbean Manual parents Male Non-Graduate parents One parent family A Comparison
Odds are multiplicative 4+ Higher Grades 1.0 1.0 Ethnic Origin 1.0 3.2 Social Class 1.0 0.5 Gender 1.0 1.0 Parental Education 1.0 0.6 No. of Parents 1.0 0.9 Odds 1.0 0.86
Naïve Odds • Drew, D., Gray, J. and Sime, N. (1992) warn of this danger…. • …Naïvely presenting isolated odds ratios is still widespread (e.g. Connolly 2006 Brit. Ed. Res. Journal 32(1),pp.3-21) • We should avoid reporting isolated odds ratios where possible!
Generally, people find it hard to directly interpret results on the logit scale – i.e. b Logit scale
Log Odds, Odds, Probability • Log odds converted to odds = exp(log odds) • Probability = odds/(1+odds) • Odds = probability / (1-probability)
Log Odds, Odds, Probability Odds are asymmetric – beware!
A Simple Stata Example Youth Cohort Study (1990) n= c.14,000 16-17 year olds y=1; pupil has 5+ GCSE passes (grade A*-C) X vars; • Gender; Parents in Service Class (NS-SEC)
Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813 .0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116 .0247608 -12.48 0.000 -.3576462 -.2605857
Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857 Estimates are log odds - Sign = direction; Size = strength
Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946-4.27 0.000 -.218139 -.0809625 service class | 1.398813.052695126.55 0.000 1.295532 1.502093 _cons | -.309116.0247608-12.48 0.000 -.3576462 -.2605857 Standard errors also measured on the logit scale Small standard errors indicate better precision of the coefficient (estimate; beta)
Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813 .0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116 .0247608 -12.48 0.000 -.3576462 -.2605857 • = b / s.e. beta; Wald c2 = (b / s.e. beta)2 @ 1 d.f. A very crude test of significance is if b is twice its standard error
Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946-4.27 0.000 -.218139 -.0809625 service class | 1.398813 .052695126.55 0.000 1.295532 1.502093 _cons | -.309116 .0247608-12.48 0.000 -.3576462 -.2605857 Formal significance test (p values)
Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946-4.270.000 -.218139 -.0809625 service class | 1.398813 .052695126.550.000 1.295532 1.502093 _cons | -.309116 .0247608-12.480.000 -.3576462 -.2605857 Confidence interval of b (on the logit scale) b ± (1.96 * standard error) e.g. -.15 – (1.96 * .035) Remember if the confidence interval does not include zero b is significant
A Thought on Goodness of Fit • Standard linear models: R2 is an easy, consistent measure of goodness of fit • Nested models: change in deviance (G2) follows a chi-square distribution (with associated d.f) • Non-nested non-linear models: change in deviance cannot be compared AND there is no direct equivalent of R2(e.g. logit model from two different surveys) • Various ‘pseudo’ R2 measures – none take on full 0 – 1 range and technically should not be used to compare non-nested models (but may be adequate in many practical situations)
A Thought on Goodness of Fit • Handy file in Stata spost9_ado produces a number of ‘pseudo’ R2 measures • Discussion at http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm • Some analysts use Bayesian Information Criteria (BIC) type measures – these evaluate (and favour) parsimony – possibly a good idea for comparing across models
Probit / Logit • Convert probit / logit (Probit b * 1.6) or (Logit b /1.6) (Amemiya 1981) Logit or probit – some say logit for a purely discrete Y (e.g. pregnancy) probit appeals to an underlying continuous distribution Some people make silly claims (e.g. the case of unemployment in Germany)
Probit • b expressed on the standard cumulative normal scale (F) • Unlike logit a calculator might not have the appropriate function • Use software or Excel [=NORMSDIST() ]
Probit • Probability 5+GCSE (A*- C) passes girl; non-service class family F(-.19) = .42 boy; non-service class family F (-.19 -.09) = .39 Gender effect .03
Probit • Stata has dprobit Here the coefficient is dF/dx i.e. the effect of discrete change of a dummy variable from 0 to 1 • Continuous X vars interpreted at their mean • Analysts often demonstrate specific values / combinations
Categorical Data (Multinomial Logit) Categorical Y Example YCS 1990 What the pupil was doing in Oct after Yr 11 0 Education 1 Unemployment 2 Training 3 Employment
Multinomial Logit Multinomial logit model = pairs of logits 1 Education / 0 Unemployment 1 Education / 0 Training 1 Education / 0 Employment Base category of y is y=1 for these pairs of models Betas are readily interpreted as in logit