Seminar on Interpreting Statistical Modelling Results for Social Researchers

‘Interpreting results from statistical modelling –A seminar for Scottish Government Social Researchers’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009

Structure of the Seminar Should take 1 semester!!! • Principals of model construction and interpretation • Key variables – measurement and func. Form • Presenting results • Longitudinal data analysis • Individuals in households – multilevel models

‘Interpreting results from statistical modelling –A seminar for Scottish Government Social Researchers’ • Our experience has shown that the results of statistical models can easily be misrepresented • In this seminar we demonstrate that the correct interpretation of results from statistical models often requires more detailed knowledge than is commonly appreciated • We illustrate some approaches to best practice in this area • This seminar is primarily aimed at quantitative social researchers working with micro-social survey data

Principles of model construction and interpretation yi=bo+ b1 X1+….+bkXk+ui Today we are interested in b • “What does b tell us?” • “Where’s the action?” • Going beyond “significance and sign”

Statistical Models The idea of generalized linear models (glm) brings together of wealth of disparate topics – thinking of these models under a general umbrella term aids interpretation Now we would say that generalized linear and mixed models (glmm) are the natural extension

Statistical Modelling Process Model formulation [make assumptions] Model fitting [quantify systematic relationships & random variation] (Model criticism) [review assumptions] Model interpretation [assess results] Davies and Dale, 1994 p.5

Building Models REMEMBER – Real data is much more messy, badly behaved (in real life people do odd stuff), models are harder to interpret etc. than the data used in books and at workshops

Building models and the ‘Workflow of Data Analysis’ • Model building is part of a complex process of data preparation (‘data management’) and iterative model building and adjustment (the overall sequence is the ‘workflow’) - Long (2009) • In our view, Stata has unparalleled functionality for conducting (and documenting) the joint processes of data preparation and model building • A wide range of statistical capabilities • Audit trail • Effective and efficient data preparation options • Routines for saving and exporting estimation results (more later) Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass

Building Models • Many of you are experienced data analysts (otherwise see our handout) • Always be guided by substantive theory (the economists are good at this – but a bit rigid) • Consider the “functional form” of the variables (especially the outcome) • Start with “main effects” – more complicated models later

X and Y • Thefocus of today are the x variables – but their interpretation is contingent on the type of y variable that is being modelled

Some Common (GLMM) Models Examples to follow • Continuous Y Linear Regression • Binary Y Logit; Probit • Categorical Y Multinomial Logit • Ordered Cat. Y Cont. Ratio; Cum Logit • Count Y Poisson

A very very simple revision example A fictitious data set – Statistics test in a university social science department (n= 60) y = end of term test score Mean 59.65; c.i. 53.03, 66.27; Max 100; Min 10; X variables Hours spent in class (register) Mean 16.13; c.i. 14.10, 18.17; Max 24; Min 2; Minutes spent on tutorials (webct) Mean 64.90; c.i. 48.39, 81.41; Max 360; Min 0;

A (vanilla) Regression Simple Stata Output… regress score classhours webctmins ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- classhours | 2.483933 .2101313 11.82 0.000 2.063152 2.904714 webctmins | .0934255 .0258709 3.61 0.001 .04162 .145231 _cons | 13.51257 3.03875 4.45 0.000 7.427573 19.59756 ------------------------------------------------------------------------------ Adj R-squared = 0.84

A (vanilla) Regression Simple Stata Output… regress score classhours webctmins ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- classhours | 2.483933 .2101313 11.82 0.000 2.063152 2.904714 webctmins | .0934255 .0258709 3.61 0.001 .04162 .145231 _cons | 13.51257 3.03875 4.45 0.000 7.427573 19.59756 ------------------------------------------------------------------------------ Adj R-squared = 0.84 On average (ceteris paribus) a one unit change in hours spent in class leads to an increase of 2.48 on the test score

A (vanilla) Regression Simple Stata Output… regress score classhours webctmins ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- classhours | 2.483933 .2101313 11.82 0.000 2.063152 2.904714 webctmins | .0934255 .0258709 3.61 0.001 .04162 .145231 _cons | 13.51257 3.03875 4.45 0.000 7.427573 19.59756 ------------------------------------------------------------------------------ Adj R-squared = 0.84 On average (ceteris paribus) a one unit change in minutes spent on the tutorial leads to an increase of 0.09 on the test score

A (vanilla) Regression Simple Stata Output… regress score classhours webctmins ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- classhours | 2.483933 .2101313 11.82 0.000 2.063152 2.904714 webctmins | .0934255 .0258709 3.61 0.001 .04162 .145231 _cons | 13.51257 3.03875 4.45 0.000 7.427573 19.59756 ------------------------------------------------------------------------------ Adj R-squared = 0.84 This is the intercept b0 – In this model it is the score (on average) that a student would get with zero attendance and tutorial work

A (vanilla) Regression Simple Stata Output… regress score classhours webctmins, beta ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- classhours | 2.483933 .2101313 11.82 0.000 .7623306 webctmins | .0934255 .0258709 3.61 0.001 .2328889 _cons | 13.51257 3.03875 4.45 0.000 . ------------------------------------------------------------------------------ Standardized beta coefficients are reported instead of confidence intervals The beta coefficients are the regression coefficients obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1 Beta coefficients can be useful when comparing the effects of variables measured on different scales (i.e. in different units such as inches and pounds) Here classhours are more influential (see also t values)

X Variable Measurement – e.g. Age • Linear units (e.g. months) • Resolution of measurement (are years better?) • Is a squared term appropriate (e.g. age may not be linear in employment models) • A banded variable (age bands – allow the direction of the effect to change; between 20-29 women’s employment behaviour might be different to 30-39)

Binary Outcomes • Logit model is popular in sociology, social geography, social policy, education etc • Probit model is more widely used in economics

A Simple Stata Example Youth Cohort Study (1990) n= c.14,000 16-17 year olds y=0; pupil has less than 5 GCSEs (grade A*-C) y=1; pupil has 5+ GCSEs (grade A*-C) X vars; • Gender; Parents in Service Class (NS-SEC)

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813 .0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116 .0247608 -12.48 0.000 -.3576462 -.2605857

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857 Estimates are log odds (logit scale) - Sign = direction; Size = strength

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946-4.27 0.000 -.218139 -.0809625 service class | 1.398813.052695126.55 0.000 1.295532 1.502093 _cons | -.309116.0247608-12.48 0.000 -.3576462 -.2605857 Standard errors also measured on the logit scale Small standard errors indicate better precision of the coefficient (estimate; beta)

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813 .0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116 .0247608 -12.48 0.000 -.3576462 -.2605857 z = b / s.e. beta; Wald c2 = (b / s.e. beta)2 @ 1 d.f. A very crude test of significance is if b is twice its standard error

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946-4.27 0.000 -.218139 -.0809625 service class | 1.398813 .052695126.55 0.000 1.295532 1.502093 _cons | -.309116 .0247608-12.48 0.000 -.3576462 -.2605857 Formal significance test (p values)

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507 .0349946-4.270.000 -.218139 -.0809625 service class | 1.398813 .052695126.550.000 1.295532 1.502093 _cons | -.309116 .0247608-12.480.000 -.3576462 -.2605857 Confidence interval of b (on the logit scale) b ± (1.96 * standard error) e.g. -.15 ± (1.96 * .035) Remember if the confidence interval does not include zero b is significant

Logit • More on interpreting logit models this afternoon • Especially why not to use odds ratios!

Probit / Logit • Convert probit / logit (Probit b * 1.6) or (Logit b /1.6) (Amemiya 1981) Logit or probit – some say logit for a purely discrete y (e.g. pregnancy) probit appeals to an underlying continuous distribution Some people make silly claims (e.g. the case of unemployment in Germany)

Logit / Probit

Probit • b expressed on the standard cumulative normal scale (F) • Unlike logit a calculator might not have the appropriate function • Use software or Excel [=NORMSDIST() ]

Probit • Probability 5+GCSE (A*- C) passes girl; non-service class family F(-.19) = .42 boy; non-service class family F (-.19 -.09) = .39 Gender effect .03

Probit • Stata has dprobit Here the coefficient is dF/dx i.e. the effect of discrete change of a dummy variable from 0 to 1 • Continuous X vars interpreted at their mean • Analysts often demonstrate specific values / combinations

Probit Knowledge of probit is relevant to more advanced modelling approaches for binary outcome data which tend to be developed in the probit framework, for example • bivariate probit see Greene (2003) • ML probit with sample selection Van de Ven and Van Pragg (1981) • random effects dynamic probit model (Stewart 2006)

Categorical Data (Multinomial Logit) Categorical Y Example YCS 1990 What the pupil was doing in Oct after Yr 11 0 Education 1 Unemployment 2 Training 3 Employment

Multinomial Logit Multinomial logit model = pairs of logits 1 Education / 0 Unemployment 1 Education / 0 Training 1 Education / 0 Employment Base category of y is y=1 for these pairs of models Betas are readily interpreted as in logit

Multinomial Logit Multinomial logistic regression Number of obs = 13925 LR chi2(3) = 80.25 Prob > chi2 = 0.0000 Log likelihood = -12653.444 Pseudo R2 = 0.0032 ------------------------------------------------------------------------------ t1dooct4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1. unemplo~t | girls | -.0840972 .0977346 -0.86 0.390 -.2756536 .1074591 _cons | -3.041328 .0708045 -42.95 0.000 -3.180102 -2.902553 -------------+---------------------------------------------------------------- 2. training | girls | -.245671 .0526523 -4.67 0.000 -.3488675 -.1424744 _cons | -1.604877 .0369626 -43.42 0.000 -1.677322 -1.532431 -------------+---------------------------------------------------------------- 3. employm~t | girls | -.3961514 .0477778 -8.29 0.000 -.4897941 -.3025087 _cons | -1.291088 .0325547 -39.66 0.000 -1.354894 -1.227282 ------------------------------------------------------------------------------ (t1dooct4==0. education is the base outcome)

Multinomial logistic regression Number of obs = 13925 LR chi2(3) = 80.25 Prob > chi2 = 0.0000 Log likelihood = -12653.444 Pseudo R2 = 0.0032 ------------------------------------------------------------------------------ t1dooct4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1. unemplo~t | girls | -.0840972 .0977346 -0.86 0.390 -.2756536 .1074591 _cons | -3.041328 .0708045 -42.95 0.000 -3.180102 -2.902553 -------------+---------------------------------------------------------------- (t1dooct4==0. education is the base outcome) Logit Unemployment / Education Logistic regression Number of obs = 10051 LR chi2(1) = 0.74 Prob > chi2 = 0.3899 Log likelihood = -1803.3779 Pseudo R2 = 0.0002 ------------------------------------------------------------------------------ t1dooct4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- girls | -.0840972 .0977345 -0.86 0.390 -.2756533 .1074589 _cons | -3.041328 .0708044 -42.95 0.000 -3.180102 -2.902554

Multinomial Logit Multinomial logit model is NOT an ordinal model 1 Education / 0 Unemployment 1 Education / 0 Training 1 Education / 0 Employment Says nothing about Unemployment / Training, Unemployment / Employment or Training / Employment

Data with Ordinal Outcomes • A large amount of data analysed within sociological studies consists of categorical outcome variables that can plausibly be considered as having a substantively interesting order (for example levels of attainment of educational qualifications) • Standard log-linear models do not take ordinality into account

Data with Ordinal Outcomes • Two different models Continuation Ratio Model Proportional Odds Model Both models have ‘logit’ style b interpretations

Reversing Category Codes – Proportional Odds Model Results reversed (signs) substantive meaning not changed – This can work well with attitude scales!

Reversing Category Codes – Continuation Ratio Model Results and substantive meaning are changed – Not Palindromically Invariant

The b that refer to the cut points (or partitions) in these two ordinal models have slightly different interpretations

Some thoughts on these ordinal models • Proportional Odds • Palindromic invariance (e.g. attitudinal scores) • Motivated by an appeal to the existence of an underlying continuous and perhaps unobservable random variable – proportional odds • Continuation ratio model • Natural base line (hierarchy in data) • Single direction of movement • Categories of Y really are discrete • Y categories denotes a shift or change from one state to another) not a coarse groupings of some finer scale

Poisson Regression Poisson regression is used to fit models to the number of occurrences (counts) of an event • Especially relevant if outcome has few values, or is a rare • Although, in some circumstances counts can reasonably be modelled as continuous outcomes – e.g. a wide range of different counts, and lack of clustering around 0 Examples of the poisson distribution • Soldiers kicked to death by horses (Bortkewitsch 1898) • Patterns of buzz bomb launch against London WWII (Clarke 1946) • Telephone wrong numbers (Thorndike 1926)

A Thought on Goodness of Fit • Standard linear models: R2 is an easy, consistent measure of goodness of fit • Nested models: (G2) change in deviance (-2 LL) follows a chi-square distribution (with associated d.f.) • Non-nested non-linear models: change in deviance cannot be compared AND there is no direct equivalent of R2(e.g. logit model from two different surveys) • Various ‘pseudo’ R2 measures – none take on full 0 – 1 range and technically should not be used to compare non-nested models (but may be adequate in many practical situations)

A Thought on Goodness of Fit • Handy file in Stata spost9_ado produces a number of ‘pseudo’ R2 measures • Discussion at http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm • Some analysts use Bayesian Information Criteria (BIC) type measures – these evaluate (and favour) parsimony – possibly a good idea for comparing across models

Some Common (GLMM) Models • Continuous Y Linear Regression regress y x1 x2 • Binary Y Logit logit y x1 x2 • Binary Y Probit probit y x1 x2 • Categorical Y Multinomial Logit mlogit y x1 x2 • Count Y Poisson poisson deaths x1 x2, exposure (pyears) • Ordered Cat. Y Cont. Ratio ocratio y x1 x2 • Ordered Cat. Y Cum Logit ologit y x1 x2

Conclusions • The results of statistical models can easily be misrepresented • The correct interpretation of results from statistical models often requires more detailed knowledge than is commonly appreciated • Social science analysts should pay more attention to developing the appropriate model context • Knowing about a wider range of glm’s / glmm’s is important • Thinking about the exact interpretation of b will help

Seminar on Interpreting Statistical Modelling Results for Social Researchers