390 likes | 498 Views
Data Management. With longitudinal data. Example 1. Information about change over time can be organized in rows or columns (one record per person or several records per person)
E N D
Data Management With longitudinal data
Example 1 • Information about change over time can be organized in rows or columns (one record per person or several records per person) • A data structure that provides timed information is rows is quite common in surveys that do not primarily focus on longitudinal data collection • It’s the worst possible data structure for longitudinal analysis whenever more than one transition per person is available
Data • Data from “Social Stratification in Eastern Europe after 1989,” conducted in 1993 • Nationally representative samples of individuals from Bulgaria, Hungary, Poland, and Russia. • Second survey, “Poverty and Social Structure in Transitional Countries,” • Nationally representative samples of individuals and households from Bulgaria, Hungary, Poland, and Russia (plus over-samples of poor people and Roma)
Work History Information • Respondents were asked about activities, including work status, occupation, and start year and month of each episode • For up to 18 episodes • Example data set contains 18 variables each for activity status, year, month, occupation, and class (EGP)
Manipulating data structures • The goal in this case was to create a data structure that more easily allows to retrieve and analyze status transitions • Class mobility between specific years (1948-1956, 1956-1964,1964-1976, 1976-1988,1988-1998, 1993-1998) that marked societal transformations in Eastern Europe
Spell data structure • Goal: have episodes in rows, hence, more than one row per person (as many rows per person as activities reported, up to 18) • There are many ways to do this. • An easy and labor-saving way is to use STATA’s reshape command • May need to rename variables
reshape long act year month occ egp, i(respid1) j(spell) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18) {Stata could figure out how many spells but not where to get the data from} (note: act1 not found) (note: year1 not found) (note: month1 not found) (note: occ1 not found) (note: egp1 not found) (note: act2 not found) (note: year2 not found) (note: month2 not found) (note: occ2 not found) (note: egp2 not found) (note: act3 not found) (note: year3 not found)………..
Application to Ivan’s data • Need to specify j variable so STATA knows how to create the rows • This did not work at first because it turned out that the ID variable was not a unique identifier in the data set (it was unique only within country) • Create a unique ID variable to use as i
Preparing for reshape • gen obs=_n • Simply gives each row a number in the order in which the rows occur, from 1 to N
Further steps • To be on the safe side, save data under different name (e.g. long.dta) • Delete empty spells • Create end date • Create variables that indicate activity status/GPE in the years of interest • Further manipulate the data structure
Example 2 • Data from the National Longitudinal Study of Adolescent Health • Dependent variable: timing of first pregnancy • Independent variable: has taken a pledge to remain a virgin until marriage
Two ways of looking at the effect of pledging • Do we see delay effect of pledging on timing of pregnancy? • Do we see a delay effect of pledging for sexually active pledgers?
Missing observations are those that did not participate in wave 3 or did not report timing of pregnancy Note specification of weights Analysis time (age, in years) and failures (=pregnancies)
Three groups compared Pledgers much later