160 likes | 262 Views
BHPS User Group. 19 January 2001. Overview. News on progress with data availability and access. Uses of the new sub-samples and weighting Open session - issues of interest, (note that we should be dealing in part with use of histories and family data in the afternoon)
E N D
BHPS User Group 19 January 2001
Overview • News on progress with data availability and access. • Uses of the new sub-samples and weighting • Open session - issues of interest, (note that we should be dealing in part with use of histories and family data in the afternoon) • Our aim over the day is a mixture of presentations and response to queries, but also to gather information on how we can improve user support (e.g. training)
Training programme • Currently 2 week BHPS data confrontation workshop in Essex summer school • A new programme of two day courses, mixture of: • basic introduction, • more specialist in relation to particular research interests • Also targeted courses in Scotland and Wales (and Newcastle and Manchester?) • We are currently planning basic course in Colchester in May • Also note User Conference in July
Using BHPS with multiple samples • BHPS had simple design up to wave six • From Wave Seven ECHP • From Wave Nine Scotland and Wales • From Wave Eleven ? • New samples mean researchers need to ask themselves additional questions in sub-setting data • It was always necessary to ask the following: • Which cases? • Which waves? • What final structure (e.g. pooled or cross-wave match)? • Should analysis be weighted, using which weights?
BHPS data - basic design • Standard set of files for each wave - following basic questionnaire structure • Households • All Individuals • Respondent Individuals • Additional repeating group data - income sources, jobs • Naming is consistent across waves, except for wave prefix - adding an additional wave is usually a simple replication • Match between wave using wHID, wPNO etc • Match across waves using PID
How the new samples fit into the design • New samples are incorporated into the standard file structure • Generally identical to cases from the first sample in data provided - (except e.g. new entrant data) • They can be identified using variables: • wHHORIG - household level • wMEMORIG - individual • MEMORIG - cross-wave files
ECHP sub-sample • Starts wave seven, when BHPS replaced former UK-ECHP • Sub-sample consists of the surviving Northern Ireland sample, and a Great Britain sub-sample, to over-represent low income households. Selected on the basis of ECHP wave 3 proxy measures (income was not yet available): • HRP unemployed now or in last year • HRP receiving Lone parent benefit • Rented housing • Means-tested welfare benefit. • At Wave 7 - 1710 respondents (235 in Northern Ireland)
Scotland and Wales extension samples • ESRC response to devolution • Now (probably) funded to 2003 • First Wave in 1999/2000 - Wave 9 BHPS • Sample structure similar to BHPS wave one, sub-regional stratification, Highlands and Islands • Scotland: 1459 households, 2407 respondents • Wales: 1428 households, 2430 respondents. • Additional questions on national identity etc. • Release with main BHPS 9 this month.
Case selection issues • Exclude new samples if longitudinal analysis requires data back to wave one • Can be included in cross-sectional analysis for recent waves • Can be included in recent wave longitudinal analysis • But selection probabilities are different - so inclusion is likely to require reweighting (partly analogous to new entrants to first sample).
Weighting issues - general • Weighting is intended to adjust for situations where the analysis sample is not a random sample of the population of inference. • Two types of departure: • Unequal selection probabilities • Missing data not completely at random • In complex panel study many possible populations of inference, e.g.: • GB population of 1991, surviving to 1998 (for longitudinal analysis) • GB population of 1998 (for cross-sectional analysis)
Weighting BHPS • In BHPS we have longitudinal weights (wLEWGHT, wLRWGHT) for the former, and cross-sectional weights (wXEWGHT, wXRWGHT for the latter). • Cross-sectional weights incorporate new-entrants, and adjust for their unequal selection probability. (New entrants can be identified through the variable wSAMPST). • Use longitudinal weight from last year of sequence. Longitudinal weights currently exclude cases who were missing in an intermediate year.
When to use BHPS weights • Always for descriptive population estimates (e.g. proportion in poverty, proportion moving from poverty to non-poverty). Weighted cross-sectional analyses should include new entrant sample members –weights are adjusted to take account of their presence. • There is an argument about regression etc. models. The survey statistics perspective is that the only cost of weighting is a (modest) increase in standard errors. The gain is some protection against model miss-specification.
Weights for the new samples • Standard weights continue to exclude these cases • New cross-sectional weights from wave seven include ECHP cases (wXEWGHTE wXRWGHTE) • Wave nine includes two sets of cross-sectional weights for Scotland and Wales cases • One set permits analysis of these samples on their own • Second set permits UK analysis incorporating these samples. • Wave Ten release will include longitudinal weights for the new samples.
Family and Household Linkages • Collection of longitudinal data about linked individuals within households, and following individuals as they move is a key advantage of BHPS • Leads to research on: • interaction between household members of behaviour, decision making and outcomes over time (applications in political research, labour market participation, parental impacts on children, migration) • household and family formation and dissolution processes and their causes and consequences
Technical issues in household linkage - cross-sectional • The household grid (wINDALL) contains both relationship to household reference person, and person number of spouse, mother, father • The file wEGOALT provides information about the relationship of all individuals in a household to each other. • Many other person identifiers relative to the subject can be found at many points in the data - (e.g. wAIDHUA - person number of person cared for within household). • These identifiers can all be used to match data between individuals at a single wave.
Technical issues in household linkage - longitudinal • Where there is no household composition change, longitudinal matching is straightforward • Easiest to start analysis of household composition change using wEGOALT data. Two cases for each pair of persons in a household - (exchanging ego and alter): • Variable wLWSTAT indicates whether alter was in same household as ego last wave • Variable wNWSTAT indicates whether alter in the same household as ego next wave • So e.g. wREL=1 (married), and wNWSTAT=2 (different household) indicates a marriage separation • Aggregating across cases allows computation of measures of overall household change