540 likes | 685 Views
Life course analysis: two (complementary) cultures? Reflections on how to analyze the transition to adulthood. Francesco C. Billari Institute of Quantitative Methods Bocconi University and IGIER. Structure. Life course analysis Two cultures? The debate in statistics
E N D
Life course analysis: two (complementary) cultures? Reflections on how to analyze the transition to adulthood Francesco C. BillariInstitute of Quantitative Methods Bocconi University and IGIER
Structure • Life course analysis • Two cultures? The debate in statistics • The event-based, or “causality” culture • The algorithmic, or “holistic” culture • Conclusions and perspectives
Life course analysis • The life course approach as an interdisciplinary program of study has been under development since the mid-1970s • The idea of studying the unfolding of individual lives has unavoidably brought life course scholars to emphasize complexity rather than simplicity
The four chief elements shaping life courses (Giele & Elder, 1998)
Life course analysis • The study of the transition to adulthood has been a primary field for life course scholars, and it has greatly benefited from advances in the life course approach and in life course analysis • For the Encyclopedia of Population I defined life course analysis as the statistical analysis of data on the timing of events (when do events happen?), their sequencing (in which order do events happen?), and their quantum (how many events happen?)
Life course analysis • Ideally, life course analysis includes the possibility to analyze the timing, sequencing, and quantum of events as depending on the elements mentioned by Giele and Elder: individual-level human development, social relations, location in time and place. • I shall argue that there are two main approaches to life course analysis, serving complementary aims
Life course analysis • The “event-based” approach focuses on events (mostly their timing and quantum) as explananda, and looks for causality • event-history analysis • program evaluation • The “holistic” approach focuses on (parts) of the life course as a whole • sequence analysis
Two “cultures”? The debate in statistics • The two approaches are somehow related to the idea of two “culture” in statistics, which has “caused” heated debate among statisticians • Breiman (2001) on Statistical Science, with several discussants
Two “cultures”? The debate in statistics • The data modelingculture, which is mainstream in statistics, assumes that “data are generated by a given stochastic data model” (discussion by David Cox) • The algorithmic modeling culture that treats the data generation mechanism as unknown (paper by L. Breiman)
Two “cultures”? The debate in statistics • Breiman: • the focus on data models has “led to irrelevant theory and questionable scientific conclusions”, “kept statisticians from using more suitable algorithmic models”, and “prevented statisticians from working on exciting new problems”
Two “cultures”? The debate in statistics • Breiman: the algorithmic culture points to: • Rashomon or “the multiplicity of good models” • conflict between simplicity and predictive accuracy (Occam’s razor) • dimensionality problem
Two “cultures”? The debate in statistics • Cox • the starting point is not “data” but “an issue, a question or a scientific hypothesis” and real scientific applications are targeted at unraveling causal links using statistics • also data collection designs count • model-based statistical techniques provide the best opportunity to illuminate causality
The event-based or “causality” culture • In life course analysis, a series of techniques has been developed to illuminate the determinants (if possible, the causes) of the timing and quantum of events • event-history analysis • program evaluation
Event-history analysis • Event history analysis generalizes life-table and standardization techniques that have been extensively used in twentieth-century demography, • it usually aims at modeling individual-level data collected from sample surveys or population register • focuses on the time-to-event as the dependent variable
Event-history analysis • The regression models of event history analysis have contributed to the explanation of life course dynamics by linking time-to-event with explanatory variables (covariates) • Time-varying covariates (events) provide a key link to causes (what triggers what)
Event-history analysis • The so-called “causal approach” (Blossfeld & Rohwer, 1995, 2002) assumes that all factors that are relevant to the simultaneous analysis of several trajectories are observed and included in the past history of the trajectories • I.e. if one looks at whether pregnancy causes marriage among cohabitants, pregnancy is a time-varying covariate in the hazard equation of time-to-marriage for cohabitants
Event-history analysis • Problem: selectivity (or unobserved heterogeneity)… cohabitants who are more “family-oriented” anticipate pregnancy and marriage. There is spurious dependence • Proposal (Lillard, 1993): use of simultaneous hazard equations for interdependent processes with potentially common determinants
Event-history analysis • Other developments: multilevel event-history models, which allow to grasp other parts of Giele & Elder’s frame by controlling for unobserved aggregate-level factors • Example: maps of age at first sexual intercourse in Italy (Billari & Borgoni, 2002)… not causal
Program evaluation • The main task of program evaluation is to estimate the causal impact of a certain program (usually, a labor market program), the treatment, on a specific outcome. This estimate is used as a support for policy decisions
Program evaluation • Two key issues: • to illuminate policies, one wants to isolate the causal impact of a certain program from other factors that link the program with the outcome • the impact has to be estimated with the least bias, because for cost-benefits evaluations the size of the impact, not only its direction or statistical significance, matters
Program evaluation • For the transition to adulthood we may be interested: • in evaluating the causal impact of events (i.e. timing or sequencing) in the transition to adulthood on the subsequent pathways to adulthood • Does teenage childbearing influence subsequent educational or labor outcomes during early adulthood? (i.e. Hotz et al., 1997)
Program evaluation • For the transition to adulthood we may be interested: • in evaluating the causal impact of events involving relevant others, in particular youths’ parents, on the transition to adulthood • does parental divorce have a causal impact on educational outcomes or family choices in the transition to adulthood? (i.e. Painter and Levine, 2001)
Program evaluation • For the transition to adulthood we may be interested: • in studying the causal impact of pathways to adulthood on relevant others • does the leaving home of a child, and in particular the transition to an “empty nest” have a causal impact on parental outcomes, i.e. happiness? (Mazzuco, 2003)
Program evaluation • The basic “evaluation problem” or the “fundamental problem of causal inference” (Holland, 1986) is that to truly know the effect of a certain event (i.e. the participation into a program), we must compare the observed outcome of an individual who has experienced the event of interest with the outcome that would have resulted had that person not experienced the event (counterfactual)
Program evaluation • We want to estimate treatment effects: • average effectof treatment: what impact would the event have on a randomly drawn individual (if the event becomes compulsory); • average effect of treatment on the treated: what impact has the event had on individuals who have experienced the event (impact of the choice, more relevant to us, see Hotz et al. 1997)
Program evaluation • Three main approaches: • data on highly related individuals: twin studies (rare, very good on physiology); • instrumental variables approach. Estimation based on IV (correlated with explanatory variables but not with outcomes)—difficult to find in life course. Most promising approach estimates bounds (Manski bounds) • propensity score matching: matching of treated and untreated individuals according to observed covariates summarized in a “propensity score”. Removes bias due to observed factors
Program evaluation • Propensity score matching. Two steps: • “parametric step” :the “propensity score” is estimated from a set of (possibly abundant) covariates that are supposed to affect the probability that the event of interest (treatment) is experienced and may also influence the outcome • i.e. one can use a probit or logit model with the probability of experiencing a parental divorce as a function of a set of youth and family characteristics
Program evaluation • Propensity score matching. Two steps: • “nonparametric step” : individuals who experienced the event of interest (treated) are matched to individuals who have not experienced the event of interest (untreated), according to the propensity scores estimated in the first step. Different approaches to matching
Program evaluation • Example of PSM with difference-in-differences estimator (controlling for time-constant unobserved factors): Mazzucco (2003) comparison of changes in parental well-being caused by the last child leaving home in France (“early” home leaving) and Italy (“late” home leaving). ECHP data
The holistic or “algorithmic” culture • By focusing mainly on specific events, with what Elder has called the “short-view in analytical scope” researchers may not grasp a unitary, holistic, perspective on life courses
The holistic or “algorithmic” culture • Two reasons to complement event-based analysis with holistic analysis (Billari, 2001 … Canadian Studies in Population ): • strong: life courses are seen as subject to accurate inter-temporal planning, for instance as an outcome of utility maximization
The holistic or “algorithmic” culture • Two reasons to complement event-based analysis with holistic analysis • pragmatic: the life course as a conceptual unit is thought of as being a contingent results of subsequent events. A holistic view is still useful as an “algorithmic” way to describe and to summarize the timing, sequencing, and quantum of life course events
Sequence analysis • In the 1990s Abbott introduced sequence analysis in the social sciences. Origins in information science and computational biology (DNA) • Life courses are represented in terms of sequences of states (time is intrinsically discrete)
Sequence analysis • As a simple example, we shall consider three states: single (S), cohabiting (C), married (M), in a monthly time scale from 20 years to 24 years and 12 months. The sequence representation of an individual life course may thus be:SSSSSSCCCCCCCSSSSSSSSSSSSSSSSSSSSSSCCCS-SSSSSSSSSSMMMMMMMMM
Sequence analysis • Having several sequences (like from a sample survey), it is already difficult to describe them • I.e. using colors (like in genomics…). Sequences of school&work for young men in Monterey, Mexico (Solis & Billari, 2003)
Sequence analysis • But description is not at all easy (… working on it) • So let us think “algorithmically”. Clustering and classifying is typical algorithmic thinking • OMA (Optimal Matching Analysis) is a method for the alignment of biosequences, which gives a similarity measure for each pair of sequences
Sequence analysis • OMA operates by transforming a sequence into another one by using three elementary operations: • insertion of a state • deletion of a state • substitution of a state • Each operation has a “cost”. The distance between two sequences is the total cost of transforming a life into another one...
Sequence analysis • E.g. if insertion and deletion have cost 1 and substitution 2, the distance between SSCCMMM and SCCMM is 2 (SSCCMMM-->SCCMMM --> SCCMM)
Sequence analysis • The resulting matrix of distance can be directly describe (e.g. average distance, Billari 2001a) or used as input for further multivariate analyses, mostly clustering • Advantages: • works with almost every kind of sequences of states
Sequence analysis • Disadvantages: • subjective cost specification (especially in demography) • difficult to identify the determinantsof group formation • “subjectivity” of clustering techniques
Sequence analysis • Other approaches: • for binary sequences (characterized by binary states): use of monothetic divisive algorithms (Billari & Piccarreta, 2001) • for sequences coded differently, in classification: machine learning (Billari, Fuernkranz, Prskawetz, 2001) • multiple correspondence analysis of sequence data
Conclusions and perspectives • The weight of the different cultures and the impact they may have on life course research is also connected to the availability of easy-to-use software packages • In general, this requires _real_ flexibility...
Software • For event-history analysis • TDA (Rohwer&Poetter, freeware) but no simultaneous hazard • STATA (commercial) but no simultaneous hazard; other general packages are less specialized (SAS, SPSS) • aML (commercial) specialized for simultaneous equations (normality assumption) available since 2000
Software • For program evaluation • Propensity Score Matching. A set of STATA programs written by Becker and Ichino is freely available
Software • For sequence analysis • TDA performs some description and computes OMA distances • distances can be then transferred to general packages for cluster analysis (we did it with SAS or STATA)
Data • Shortly: we need more and more longitudinal data, especially data measuring factors related to selection (ability, personality, socialization and orientations) at the beginning