Longitudinal Workforce Analysis using Routinely Collected Data: Challenges and Possibilities

Longitudinal Workforce Analysis using Routinely Collected Data:Challenges and Possibilities Shereen Hussein, BSc MSc PhD King’s College London

Longitudinal Analysis General advantages General challenges Conventional statistical methods require independence between observations Longitudinal data are likely to violate this assumption Missing data due to attrition Data availability • Can control for individual heterogeneity • Subject serve as own control • Between-subject variation excluded from error • Can better assess causality than cross-sectional data

Workforce Data Example: NMDS-SC • Structure • Design • Coverage • Time span • Type of information collected • Data collection and archiving • size

NMDS-SC data structure Social care providers in England Complete NMDS-SC returns Aggregate information on the workforce Detailed information on all or some individual workers Providers’ Database workers’ Database Linkable

NMDS-SC longitudinal analysis: potential • Data coverage • Wide range of providers and individual workers’ information • Sector specific- uniqueness • Hierarchical structure • Workforce development and business sustainability • Timely • Demographics, austerity, unemployment • Economics • Care costs, including turnover costs • Pay • Linkable to local data characteristics

Challenges in NMDS-SC longitudinal analysis • No sampling framework • No regular intervals for data collection • Irregularities in data completion by different providers • Additions/alterations of variables and fields • Cumulative nature and consequences on data size and structure • Archiving

Challenges in NMDS-SC longitudinal analysis- continued Computational Methodological Unusual patterns of follow-up Censoring Variability in the database over time Unbalanced cohort design Missing data Update frequency Attrition True exit Other methodological issues • Data size • Innovation in system design and architecture • Accumulative property • Scalability of the system • Changes in data fields • Variable additions and omissions • Data over-ride and archiving • Software and hardware issues

Providers’ level longitudinal mapping • From December 2007 to March 2011 • Linked 18 separate databases on the providers’ level • Each has records from 13,095 to 25,266  421,671 valid records included in the construction • Number of updates ranged from 0 to 18 per provider • Continuous process, more records added every 3 months

Meta-data analysis: providers with different number of events

Specific example 1: Providers with 18 updates

Specific example 2: Providers with 2 updates

Density distribution plot of providers with at least 2 updates during the period December 2007 to March 2011

density distributions of number of days elapsed between two updated providers’ events

Simple example using providers’ database: workforce stability over time • Longitudinal changes in care workers’ turnover and vacancy rates over time • From January 2008 to January 2010 • Changes in reasons for leaving the sector, identified by employers • Differentiating between those with improved (reduced) turnover rates and those with worse (increased) turnover rates

Pre analysis • Selecting and constructing providers’ panel • Including those with at least two updates within +/- 3 months of T1 and T2 • 2953 providers with mean coverage duration of 602d • Investigate sample representation • Data quality checks • Data manipulation/imputation

Some findings: changes in turnover rates

Reason for leaving and turnover rate changes

Analysis expansion: next steps • Consider changes over a longer period of time • Examine other providers’ characteristics • Different take on panel inclusion criteria • Link to individual workers’ longitudinal databases to examine relations with detailed workforce structure • Pay, qualifications, profile etc. • Build economic elements within analyses models, e.g. specific-turnover costs, within the longitudinal model

Workers’ level longitudinal analysis • A much larger database • Same period of time- over 11M records • Providers not required to complete information for ‘all’ workers • Structural/design missing data • True missing data • Linkage issues • more data fields required for identification and linkage • Considerably large number of variables and fields • Careful planning; analysis-tailored data retrieval • Changes in database • Amendments, new variables etc. • Programming intensive and demanding models (may not be replicable for different databases)

Issues to consider • Suitability of models • Longitudinal structure • Competing risks • Measurement window • Late entry into risk sets • Use proxies, other variables in the dataset • Adopt suitable approach/model • Censoring (LHS and RHS) • Assumptions • Guided by: • Sector-specific knowledge • Intelligence from other variables in the data

Current longitudinal researchWatch this space!! • Workforce mobility within the sector • Occupation durations • Characteristic-specific probabilities of exiting or remaining in the sector • Characteristic-specific probabilities of moving employer within the sector or having multiple jobs • Career pathways within the sector

Acknowledgments • Thanks to the Department of Health for funding this work • Thanks to Skills for Care for providing the data on regular basis • Thanks to Analytical Research Ltd for their technical and quantitative support

Further information • Shereen.hussein@kcl.ac.uk • 02078481669 • See: • http://www.kcl.ac.uk/sspp/departments/sshm/scwru/res/knowledge/nmdslong.aspx

Longitudinal Workforce Analysis using Routinely Collected Data: Challenges and Possibilities