330 likes | 794 Views
What is… The Analysis of Longitudinal Survey Data Paul Lambert University of Stirling Prepared for: National Centre for Research Methods, Research Methods Festival, St Catherine’s College, Oxford, 7 July 2010. Also see: www.longitudinal.stir.ac.uk / www.dames.org.uk.
E N D
What is… The Analysis of Longitudinal Survey Data Paul Lambert University of StirlingPrepared for: National Centre for Research Methods, Research Methods Festival, St Catherine’s College, Oxford, 7 July 2010 Also see: www.longitudinal.stir.ac.uk / www.dames.org.uk
So what’s distinct about the analysis of longitudinal survey data? You already know.. • Working with (survey) datasets with longitudinal information (data about time) and the specialist techniques of statistical analysis that are appropriate You maybe don’t realise.. • Groups of techniques and data types • Complex data and data management components July 2010: LDA
1) Types of longitudinal survey data • Survey resources • Longitudinal [‘..of or about time..’] • {Analysis is concerned with time} • Data is concerned with more than one time point • [e.g. Taris 2000; Blossfeld and Rohwer 2002] • Repeated measures over time • [e.g. Menard 2002; Martin et al 2006] Data analysis is used to give a parsimonious summary of patterns of relations between variables in the survey dataset July 2010: LDA
Types of data and analysis traditions for longitudinal surveys cf. www.longitudinal.stir.ac.uk July 2010: LDA
[Data type: 1/6] Temporal effects in single cross-sectional surveys • Temporal effects are (a) present and (b) of interest in most social science studies • We can measure differences between people in terms of their age / year of birth • These matter empirically & are interesting substantively • But we can’t tell if differences are due to age or period or cohort (or other things that are collinear with these, e.g. life course stage or major events) July 2010: LDA
Longitudinal statements from cross-sectional data are common... • We typically fit linear/curvilinear trend lines for time effects • Treiman (2009: 162): nonlinear specifications of time and age effects • Year of birth effect on literacy in China: discontinuity at 1955; curve 1955-1967; knot at 1967
[Data type: 2/6] Repeated cross-sections: Surveys on same topics, on multiple occasions, to different people Data example: GHS pooled ‘time-series’ dataset (UKDA, SN: 5664) Adults aged 25-65 only
Repeated cross sections • Easy to communicate & appealing: how things have changed between certain time points • Can distinguishes any 2 of age / period / cohort • Easier to analyse – less data management However.. • Don’t get other QnLR attractions (nature of changers; residual heterogeneity; causality; durations) • Hidden complications: are sampling methods, variable operationalisations really comparable? • More on this below... July 2010: LDA
Example: Labour Force Survey yearly stats July 2010: LDA
LFS and time (example in SPSS from www.longitudinal.stir.ac.uk) July 2010: LDA
[Data type: 3/6] Panel Datasets Information collected on the same cases at more than one point in time • ‘classic’ longitudinal design • incorporates ‘follow-up’, ‘repeated measures’, and ‘cohort’; large and small in scale • Several major panel studies in UK, e.g. www.esds.ac.uk/longitudinal • Many cross-sectional surveys feature additional panel elements July 2010: LDA
Complex data example: BHPS panel dataset [SN 5151] July 2010: LDA
Panel data advantages • Study ‘changers’ – how many of them, what are they like, what caused change • Control for individuals’ unknown characteristics (‘residual heterogeneity’) • Develop a full and reliable life history • e.g. family formation, employment patterns July 2010: LDA
Example: Panel transitions July 2010: LDA
Panel data can be ‘wide’ or ‘long’ • Depends upon the analytical approach • Wide format is simpler to envisage but analysis will need unbalanced data or missing value imputations • Long format is harder to manipulate (e.g. to cross-check), but is more flexible in the types of analysis it supports July 2010: LDA
Panel models: Regression style models with various estimators to recognise the repeated contacts: e.g. random effects; fixed effects; population average; linear(model: influences on GHQ score in the BHPS; Stata examples available via www.dames.org.uk/workshops) July 2010: LDA
[Data type: 4/6] Cohort Datasets Information on a group of cases which share a common circumstance, collected repeatedly as they progress through a life course • Intuitive type of repeated contact data • e.g. ‘7-up’ series • Often contributes to cross-cohort comparisons • e.g. UK Birth cohort studies in 1946, 1958, 1970 and 2000 July 2010: LDA
Cohort data and analysis in the social sciences • Many circumstances parallel other panel types: • Large scale studies ambitious & expensive • Small scale cohorts still quite common… • Attrition problems often more severe • Considerable study duration limits • Glenn (2005) argues that ‘cohort analysis’ should be specifically directed to understanding effects of ageing/progression over time • Other uses of cohort data are just = panel data • It remains hard - even with extensive cohort data - to authoritatively understand ageing effects (age = period – cohort) July 2010: LDA
[Data type: 5/6] Event history data analysis[esp. Blossfeld et al 2007] Focus shifts to length of time in a ‘state’ - analyse determinants/patterns to time in state(s) • Data sources are panel / cohort studies, or retrospective interviews (…recall errors..) • Analysis of event durations: ‘Event history analysis’; ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’; .. • Analysis of event patterns: ‘Sequence analysis’; ‘trajectory analysis’; ‘optimal matching analysis’; ‘latent growth curves’
Key to event histories is ‘state space’ July 2010: LDA
Example: Cox regression (SPSS example at www.longitudinal.stir.ac.uk) July 2010: LDA
[Data type: 6/6] Time series data Statistical summary of one particular concept, collected at repeated time points from one or more subjects Examples: • Unemployment rates by year in UK • University entrance rates by year by country Comments: • Panel = many variables few time points = ‘cross-sectional time series’ to economists • Time series = few variables, many time points • Descriptive analyses – e.g. charts of statistics over time • Advanced modelling analyses typically involve including ‘autoregressive’ terms (e.g. lag effects) amongst explanatory factors
….Six types of data/analysis…! July 2010: LDA
[..and then there’s another thing..] 2. Data management issues • Working with longitudinal survey data is made more challenging by important issues of ‘data management’ • Variable operationalisations for comparisons e.g. strategies for standardisation, harmonisation • Linking datasets internally to a study • Linking with other datasets to enhance analysis [Value of organising your data and files – e.g. Long, 2009] • Recognising data structure in analysis e.g. missing data; survey effects; modelling specifications
Dealing with complex data In the UK we host many projects and centres which contribute to enabling the analysis of complex longitudinal data for social science research • Specifying suitably complex statistical models • Examples at the Centre for Multilevel Modelling (‘E-Stat’ a generic tool for specifying advanced models; Realcom – for analysing longitudinal missing data); Lancaster-Warwick-Stirling NCRM Node; ULSC (Essex) on survey design effects • Resources on accessing and handling complex data • e.g. ESDS; ADMIN Node; Obesity e-lab; DAMES Node • ..Session 17 in yesterday’s programme.. July 2010: LDA
My own pet project concerns comparability of variables over time..(see www.dames.org.uk) July 2010: LDA
…‘Effect proportional scaling’ using parents’ occupational advantage July 2010: LDA
3. Some closing comments on the analysis of longitudinal survey data Why bother with all this..? • Focus on change / stability • Focus on the life course • Distinguish age, period and cohort effects • Career trajectories / life course sequences • Focus on time / durations • Substantive role of durations (e.g. Unemployment) • Getting the ‘full picture’ • Causality and residual heterogeneity • Examining multivariate relationships • Representative conclusions [e.g. Abbott 2006; Mayer 2005; Menard 2002; Baltagi 2001; Rose 2000; Dale and Davies 1994; Hannan and Tuma 1979; Moser 1958]
Research traditions • ‘geographers study space and economists study time’ [adage quoted in Fotheringham et al. 2000:245] • Vast economics literature using techniques for temporal analysis • Other social science disciplines to some degree catching up • Though methodological research on longitudinal models, and data quality, cross-cuts disciplines [e.g. Dale and Davies, 1994] • Data expansions c1990 -> more encompassing models; new substantive applications areas • For example: • [Platt 2005] - ethnic minorities’ social mobility 1971-2001 • [Pahl & Pevalin 2005] – Friendship patterns over time • [Verbakel & de Graaf 2008] – spouses effect on careers 1941-2003 • …One challenge is getting used to talking about time in a more disciplined way: e.g. traditional sociological characterisations of ‘the past’ and ‘social change’ may not be empirically satisfactory
What’s exciting in the analysis of longitudinal social survey data? • A personal view:
References • Abbott, A. (2006). 'Mobility: What? When? How?' in Morgan, S.L., Grusky, D.B. and Fields, G.S. (eds.) Mobility and Inequality. Stanford: Stanford University Press. • Baltagi, B.H. (2001). Econometric Analysis of Panel Data. New York: Wiley. • Blossfeld, H.P. and Rohwer, G. (2002). Techniques of Event History Modelling: New Approaches to Causal Analysis, 2nd Edition. Mawah, NJ: Lawrence Erlbaum Associates. • Blossfeld, H. P., Grolsch, K., & Rohwer, G. (2007). Event History Analysis with Stata. New York: Lawrence Erlbaum • Davies, R.B. (1994). 'From Cross-Sectional to Longitudinal Analysis' in Dale, A. and Davies, R.B. (eds.) Analysing Social and Political Change : A casebook of methods. London: Sage. • Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2000). Quantitative Geography: Perspectives on Spatial Data Analysis. London: Sage. • Glenn, N. D. (2005). Cohort Analysis, 2nd Edition. London: Sage. • Hannan, M. T., & Tuma, N. B. (1979). Methods for Temporal Analysis. Annual Review of Sociology, 5, 303-328. • Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Ed. Colchester, Essex: UK Data Archive [distributor], SN: 5666. • Long, J.S. (2009). The Workflow of Data Analysis using Stata. Boca Raton, Texas: • Martin, J., Bynner, J., Kalton, G., Boyle, P., Goldstein, H., Gayle, V., Parsons, S. and Piesse, A. 2006. Strategic Review of Panel and Cohort Studies. London: Longview, and www.longviewuk.com/ • Mayer, K.U. 2005. 'Life courses and life chances in a comparative perspective' in Svallfors, S. (ed.) Analyzing Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. • Menard, S. 2002. Longitudinal Research, 2nd Edition. London: Sage, Number 76 in Quantitative Applications in the Social Sciences Series. • Moser, C. A. (1958). Survey Methods in Social Investigation. London: Heinemann. • Pahl, R., & Pevalin, D. (2005). Between family and friends: a longitudinal study of friendship choice. British Journal of Sociology, 56(3), 433-450. • Platt, L. (2005). Migration and Social Mobility: The Life Chances of Britain's Minority Ethnic Communities. Bristol: The Policy Press. • Rose, D. (2000). Researching Social and Economic Change: The Uses of Household Panel Studies. London: Routledge. • Taris, T.W. (2000). A Primer in Longitudinal Data Analysis. London: Sage. • Treiman, D.J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Josey Bass. • Verbakel, E., & de Graaf, P. M. (2008). Resources of the Partner: Support or Restriction in the Occupational Career Developments in the Netherlands Between 1940 and 2003. European Sociological Review, 24(1), 81-95.