780 likes | 1.63k Views
What is? Structural Equation Modeling ( A Very Brief Introduction ). Patrick Sturgis University of Surrey. What is SEM?. SEM is not one statistical ‘technique’. It integrates a number of different multivariate techniques into one model fitting process. It is essentially an integration of:
E N D
What is?Structural Equation Modeling (A Very Brief Introduction) Patrick Sturgis University of Surrey
What is SEM? • SEM is not one statistical ‘technique’. • It integrates a number of different multivariate techniques into one model fitting process. • It is essentially an integration of: • Measurement theory • Factor analysis • Regression • Simultaneous equation modeling • Path analysis
SEM is essentially Path Analysis using Latent Variables
What are Latent Variables? • Most/all variables in the social world are not directly observable. • This makes them ‘latent’ or hypothetical constructs. • We measure latent variables with observable indicators, e.g. questionnaire items. • We can think of the variance of an observable indicator as being partially caused by: • The latent construct in question • Other factors (error)
True score and measurement error x = t + e Error Measured True Score True point on continuum Systematic Error Mean of Errors ≠0 Random Error Mean of Errors =0
error True score The True Score Equation X = t + e Can be expressed diagrammatically Observed item Problem – with one indicator, the equation is unidentified. We can’t separate true score and error.
Identifying True Score & Error • This means we need multiple indicators of each latent variable • With multiple indicators we can use Factor Analysis to estimate these parameters • Factor analysis transforms correlated observed variables into uncorrelated components • We can then use a subset of components to summarise the observed relationships
Latent Construct A Common Factor Model Indicator 1 Indicator 2 Indicator 3 Indicator 4 .6 .5 .5 .7 Indicators become conditionally independent Factor loadings = regression of factor on indicators
Factor Analysis • So the factor loading is the standardised regression of the latent variable on the indicator. • Squaring the factor loading gives us the % of variance ‘explained’ by the latent variable (factor). • This can be considered as the true score component of the item. • 1-the % variance explained by the factor gives us the residual or ‘error’ variance. • Thus, the variance of the factor contains only the true score component of each item.
Benefits of Latent Variables • Most social concepts are complex and multi-faceted • Using single measures will not adequately cover the full conceptual map • Systematic error biases descriptive and causal inferences • Stochastic error in dependents leaves estimates unbiased but less efficient • Stochastic error in independents attenuates associational effect sizes estimates
Remember SEM is essentially Path Analysis using Latent Variables We now know about latent variables, what about path analysis?
Path Analysis • Sewell Wright, a biologist, developed the fundamental ideas of path analysis in the 1920s. • The diagrammatic representation of a theoretical model. • Standardised Notation. • Estimation of a series of regression models to ‘decompose’ effects: • Direct • Indirect • Total
- + + + + Direct, Indirect and Total Effects Example: effect of exam nerves on exam performance Physical/mental anxiety Exam performance Exam stress Exam preparation
Effect Decomposition • We can break down effects of X on Y into direct, indirect and total. • Direct = a • Indirect = b x c • Total = a + b xc Y1 b X1 c a Y2
Latent variable Observed variable Residual or Error Term Causal Effect Covariance Path Standard Symbols for Path Analysis
e1 e2 e3 1 1 1 O1 O2 O3 e7 e8 e9 1 1 1 1 gender O4 O5 O6 1 1 O10 e10 1 1 attitude behaviour O11 e11 1 O12 e12 1 1 class 1 e13 e14 O7 O8 O9 1 1 1 e6 e5 e4 So when a path diagram includes latent variables… …it becomes a SEM
Simultaneous Equations We might estimate this as 3 separate models factor model, run out factor score variable Regression Y on X In SEM, we estimate the equations simultaneously Regression Y on X, Z
Estimation & Model Fit • Variety of estimators available but predominantly maximum likelihood (ML) • Model fit can be tested by comparison of Likelihoods for specified v baseline model • Tests significance of difference in likelihood between specified and observed variance covariance matrices • With large n, no model fits! • ‘Adjusted’ indices (RMSEA, CFI, etc. etc..) • Perhaps more useful for testing nested models, where the comparator is more substantively meaningful.
Other things we can do with SEM… • Panel data models (tomorrow!) • Categorical endogenous variables • Multiple group models • Latent variable interactions • Model missing data • Complex sample data • Multi-level SEM