Data Management

Data Management With longitudinal data

Example 1 • Information about change over time can be organized in rows or columns (one record per person or several records per person) • A data structure that provides timed information is rows is quite common in surveys that do not primarily focus on longitudinal data collection • It’s the worst possible data structure for longitudinal analysis whenever more than one transition per person is available

Data • Data from “Social Stratification in Eastern Europe after 1989,” conducted in 1993 • Nationally representative samples of individuals from Bulgaria, Hungary, Poland, and Russia. • Second survey, “Poverty and Social Structure in Transitional Countries,” • Nationally representative samples of individuals and households from Bulgaria, Hungary, Poland, and Russia (plus over-samples of poor people and Roma)

Work History Information • Respondents were asked about activities, including work status, occupation, and start year and month of each episode • For up to 18 episodes • Example data set contains 18 variables each for activity status, year, month, occupation, and class (EGP)

Manipulating data structures • The goal in this case was to create a data structure that more easily allows to retrieve and analyze status transitions • Class mobility between specific years (1948-1956, 1956-1964,1964-1976, 1976-1988,1988-1998, 1993-1998) that marked societal transformations in Eastern Europe

Spell data structure • Goal: have episodes in rows, hence, more than one row per person (as many rows per person as activities reported, up to 18) • There are many ways to do this. • An easy and labor-saving way is to use STATA’s reshape command • May need to rename variables

reshape long act year month occ egp, i(respid1) j(spell) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18) {Stata could figure out how many spells but not where to get the data from} (note: act1 not found) (note: year1 not found) (note: month1 not found) (note: occ1 not found) (note: egp1 not found) (note: act2 not found) (note: year2 not found) (note: month2 not found) (note: occ2 not found) (note: egp2 not found) (note: act3 not found) (note: year3 not found)………..

Application to Ivan’s data • Need to specify j variable so STATA knows how to create the rows • This did not work at first because it turned out that the ID variable was not a unique identifier in the data set (it was unique only within country) • Create a unique ID variable to use as i

Preparing for reshape • gen obs=_n • Simply gives each row a number in the order in which the rows occur, from 1 to N

Further steps • To be on the safe side, save data under different name (e.g. long.dta) • Delete empty spells • Create end date • Create variables that indicate activity status/GPE in the years of interest • Further manipulate the data structure

Example 2 • Data from the National Longitudinal Study of Adolescent Health • Dependent variable: timing of first pregnancy • Independent variable: has taken a pledge to remain a virgin until marriage

Two ways of looking at the effect of pledging • Do we see delay effect of pledging on timing of pregnancy? • Do we see a delay effect of pledging for sexually active pledgers?

Missing observations are those that did not participate in wave 3 or did not report timing of pregnancy Note specification of weights Analysis time (age, in years) and failures (=pregnancies)

Three groups compared Pledgers much later

. sts graph,by(numptype) tmax(25) tmin(10) noorigin

Data Management

Data Management

Presentation Transcript

Data Management

Data Management

Data Management

Data Management

Data Management

Data Management

Data Management

Data Management

Data Management

Data Management

DATA MANAGEMENT

Data Management

Data Management

Data Management

data Management:

Data management

Data Management

Data Management

Data Management

Data Management