510 likes | 654 Views
Building Typologies of Individual Trajectories: An Illustration of Different Statistical Methodologies on the Same Data. Laurent Lesnard (OSC-CNRS and CREST-INSEE) Patrick Rousset (Cereq), Danièle Trancart (Gris, University of Rouen and CEE). Plan . 1- Introduction 2- Data and coding
E N D
Building Typologies of Individual Trajectories: An Illustration of Different Statistical Methodologies on the Same Data. Laurent Lesnard (OSC-CNRS and CREST-INSEE) Patrick Rousset (Cereq), Danièle Trancart (Gris, University of Rouen and CEE)
Plan • 1- Introduction • 2- Data and coding • 3- Some results about level of education and skills • 4- Building typology using correspondence analysis and cluster analysis • 5- Building typology using Qualitative Harmonic Analysis and cluster analysis and Optimal Matching Analysis (OMA) • 6- Building typology using Self-Organizing-Maps with a distance that includes dynamics of time • Conclusion
1-Introduction (1) • Transition from school to work may be quite difficult in France for young people leaving educational system during the nineties : Situation characterized by increased flexibility, precariousness. Non skilled youngsters are particularly exposed to precarious trajectories, aggravated by recurring unemployment. • Besides, non skilled jobs increased during the last decades in spite of the higher overall training level which leads to an over-education. It means that young people more often obtain job positions that normally require lower diplomas than they possess.
1-Introduction (2) • When more than half of a generation passed at least the “Baccalauréat” (as against 26% in 1980), is the V level (2-a Casmin level =vocational education as “CAP, BEP”) still able to open the doors of skilled non manual or manual jobs for those students? • If we consider a job’s first integration stage as a time for socialization and acquiring additional experience, does a first experience in a non-skilled job happen to be beneficial in itself? • What is its impact on the rest of the occupational career?
1-Introduction (3) • We’ll focus on the transition from school to work for young girls who are more often in unskilled job positions than their male counterparts. Our analysis is based on a longitudinal school leavers’ survey: Generation 98 and we will present several variants on the same data to describe the longitudinal vision of the school-to-work transition. • It is an illustration of the Yvette Grelet and Nicolas Robette‘s methodological paper that offers a comprehensive and detailed discussion of a series of important issues that have to be examined in order to build typologies of individual trajectories from longitudinal (or sequential) data, especially different possibilities to code the data, different choices for the distance between two individuals, and different possibilities for the method of classification. • The several variants presented are: • Correspondence analysis and cluster analysis based on the Ward criterion • Qualitative Harmonic Analysis • Self-organisation-maps (SOM) with a Kohonen algorithm • Optimal Matching Analysis (OMA)
2-Data and coding (1) • In 1998, 750 000 students or apprentices left initial education at all levels and from all training fields. • Three years after, Céreq interviewed a sample of 55 000 young people belonging to this cohort. • The ‘Generation 98’ survey aimed at analysing the different components of the transition pathways. • Respondents were asked to describe their successive jobs and employers. They also gave information on their schooling career and family background. Educational pathwaysare described with the exit class, diplomas and certificates, and fields of study.
2-Data and coding (2) The Level of education (qualification) French Socio-occupational Categories (PCS) : • Executive and intermediate • Skilled non manual workers (according to O. Chardon grid) • Unskilled non manual workers (according to O. Chardon grid) • Skilled manual workers • Unskilled manual workers
2-Data and coding (3) • The sample used describes the monthly work history between 1998 and 2001 in France. Finally 8 monthly situations are considered: • Non employment spell codes are: unemployment, inactivity or studies. • Employment spell are coded in the 5 categories described above
3-Some results about level of education and skills (1) • Two different dimensions: the qualification of the worker (level of education) and skills of the job itself.
Figure 1: Education levels in France in 1998 The 1998 generation is better educated than the former one. 62% of people reached the level IV ( Bac) in G98 as against 56% to reach that same level in G92 9% of young people still leave the school system without any qualification. This last figure, while tending to decrease, remains quite high considering the orientation law of 1989 which aimed at giving everyone the means to reach at least the level V (CAP, BEP). Young people with no qualification are more often men, from a working-class background, are more likely to come from a large family, or with immigrant parents. Moreover, those young people’s parents are, more than the others’, also unskilled or unemployed. 3-Some results about level of education and skills (2 ) : Level of education
3-Some results about level of education and skills (3 ) : skills • In the classification of social and occupational groups, skilled and unskilled manual workers are separated. As for non manual workers, the boundary between skilled and unskilled is trickier. It is largely dependent on the branch and not often taken into account in the surveys. • The generally accepted definition of non skilled jobs (O. Chardon, 2001) enables us to assess the number of young people having one in the survey G98.
3-Some results about level of education and skills (4 ) : skills • In the end, unskilled employment concerns 34% of young people in January 99 and 28% in April 2001. This slight decrease shows that unskilled employment remains important. Indeed, 42% of young people held at least one non skilled job during the observation period. • The level of education and the degree significantly prevent from unskilled employment. The frontier between unskilled and skilled employment is shifting from the level V to the level IV (figure 2). • The rate of non skilled jobs also depends on gender : young women, especially those with a low level school background are more often in unskilled job positions than their male counterparts (Figure 3).
3-Some results about level of education and skills (3 ) : skills • Figure 3
4-Building typology using correspondence analysis and cluster analysis (1) • There are 28 months of observation (from January 1999 to April 2001) and 8 different states (28*8 =224 dummy variables). The analysed table is composed of 224 variables and 8083 individuals (young women in at least one unskilled position during the period). The 8 states are: unemployment, inactivity, studies, executive or intermediate, non skilled non manual worker, skilled non manual worker, non skilled manual worker, skilled manual worker. • First a correspondence analysis is performed, and then a cluster analysis with Ward criterion and consolidation with K-means on 20 factors (78% of the variance) and finally a typology in 8 groups is built.
4-Building typology using correspondence analysis and cluster analysis (2) • The two main clusters are made of individuals who are stable as respectively unskilled non manual workers (cluster 1, 34%) and unskilled manual workers (cluster 2, 17%). There are two clusters with long and/or frequent unemployment (cluster 3, 13%) and inactivity (cluster 4, 6%)) spells. • Cluster 5 (10%) shows a transition from unemployment or unskilled non manual work to skilled non manual work. Cluster 6 (6%) describes the trajectory back to studies then to work. Cluster 7 (3%) shows the transition to skilled manual work and Cluster 8 (8%) the transition from unemployment or unskilled non manual work to the upper skilled work as executive and technician. • Analysing these clusters, only 1/3 reach a skilled position at the end of the period.
4-Building typology using correspondence analysis and cluster analysis (3)
4-Building typology using correspondence analysis and cluster analysis (3)
QHA Probabilities of transition vary with time Trajectories (month 13 - month 40) are split into 5 spells of unequal length according to the quantiles of events: 13-18, 19-22, 23-28, 29-33, 34-40 The proportion of time spent in each of the 8 occupational states for the 5 spells yield 40 variables (5 x 8) Principal Component Analysis is applied to these 40 variables The first 12 factors (corresponding to 78% of inertia) are used in a Hierarchical Cluster Analysis Cluster analysis: Beta-flexible The 10-cluster solution is optimal OMA Dissimilarity measure : Optimal Matching (indel=1, substitution=2 ; TDA) Parameterization equivalent to finding the longest common subsequence Cluster algorithm : beta-flexible (ß = – 0.3) Number of clusters : “elbow” criterion 10-cluster solution 5 - Qualitative Harmonic Analysis and Optimal Matching (1)
5 - Qualitative Harmonic Analysis and Optimal Matching (2) • First 8 clusters (OMA) and all clusters (QHA) are defined by one of the eight states • State space by and large defined by the French social class scheme • Low intra-generational social mobility • Size of these “natural” clusters varies with the method (dissimilarity + cluster algorithm) • Methods are in agreement on most of the patterns buried in the data • But differences in the size of these patterns • Studying these “discrepancies” is a promising avenue of research
100% Unskilled manual workers Skilled manual 80% workers Skilled non manual workers 60% Unskilled non manual workers 40% Executive and intermediate occupations Studies 20% Inactivity 0% Unemployment 13 18 23 28 33 38
6- Building typology using Self-Organizing-Maps Plan • 6-1 Distance • Longitudinal specificities • The approach and the distance • The distance between states that evolves with time • The distance between trajectories • 6-2 Self-organizing-maps • The Kohonen algorithm • The Typology result • Exogenous dimensions • Projection of other methods
6-1 Distance : Longitudinal specificities • Correlation should be graduated considering time delay • When correlation is assimilated to incidence. • The necessity of an equilibrium between lower and upper frequencies (as with c²) depends on cases: • Principle of distributional equivalency: no sensibility to the increase of weights when grouping several few sub-categories into a main one. • The interest in one category can decrease with time and its frequency (the national service 5 years after leaving school) but the question is maintained considering homogeneity of the survey. • From the individual point of view, rare does not mean important. • Weighting months? states? • Stabilization (in the end of trajectories) strongly structures classifications. • Which length of trajectories? • The high weight of items creates a large inertia for a few variation and keeps a large number of clusters.
6-1 Distance : The approach and the distanceHypothesis on status of employment • Hypothesis : Existence of proximities between status of work that evolve with time. • Theoretical example : a 1st case - of equidistance between items - needs 4 classes whereas a 2nd case - including proximities - needs 1. • 1st case : 1% US-NM-W->unemployed, 1% S-NM-W->unemployed, 1% US-NM-W-> inactivity-, 1% S-NM-W-> inactivity. • 2nd case : 4% NM-W-> out of employment • The 2nd case increases the chance for a population to emerge • Unemployment is closer from employment in the beginning of the career and from inactivity at the end of the trajectory. • The distance must integrate proximities between states and their evolution in time.
6-1 Distance : Distance between states that evolves with time (1) • The situation, defined as the couple (state, month), introduces time. • The potential future of the present situation S towards future (any other situation S’) is defined by the profile PS : • where • F measures the flow relative to situation S and any situation S’ as the empirical probability to reach S’ starting from S. • The coefficient of temporal inertia b(t’) weights the proximity between S and S’ decreasing with the time delay. • The coefficient a makes PS to be a profile : • The distance between situations is the c² distance between profiles.
6-1 Distance : Distance between states that evolves with time (2) Main aspects: • || correspondence and cluster analysis: the principle of distributional equivalency. • || distance between situations and cost of substitution in OM. • || Markov chains. Remark: • Possibility to introduce a complementary information (concerning status of work) to define the distance between situations.
6-1 Distance : Distances between trajectories • From the matrix of distances between situations, one deduces inertia and principal components of inertia. • The eigenvectors are called principal events - locating situations in the structure of status of employment, time and spread. • Trajectories are coded as linear combinations of principal events. Canonical (in term of situations) Linear Combination of principal events Ee: • The distance between trajectories is the Euclidean one between post-coded trajectories. • The weights of months are linked with the innovation.
6-2 Self organizing maps : The Kohonen algorithm • SOM – in the Kohonen version – generalizes clustering k-means methods introducing a notion of neighborhood between classes. • A system of representation is associated: • Clusters are organized on a network, called map, considering their proximity in the input space. • Main property : The Preservation of topology: • Two individuals associated to neighboring units on the map are neighbors in the input space. • Examples of structures at one or two dimensions: • green, blue and red are 3 levels of neighborhood at radius 2, 1 and 0
6-2 Self organizing maps : The typology result (1) • Any unit is used as a graphical display including a chronogram that characterizes the referring class. • Two neighboring classes have similar chronograms. • Proximity expresses the continuity in time. Cartography of careers
6-2 Self organizing maps : The typology result (2) A second level of grouping • The symbolic representation (the equidistance between classes) conceals irregularities of the distribution in the input space. • A gap in proportion with the distance between neighboring classes expresses the local structure . • A hierarchical classification of centroids gives a global structure by subdividing the map in areas. 2nd level of grouping : a hierarchical classification of centroids
6-2 Self organizing maps : The typology result (5) The Center of the map concerns “out of employment” 1 area « out of employment » Unemployment Inactivity
6-2 Self organizing maps : The typology result (6) The North of the map concerns non manual workers Skilled towards unskilled non manualworkers Skilled non manual workers Unskilled non manual workers Unemployment Unskilled towards skilled non manual workers Unskilled towards executive and intermediate occupation
6-2 Self organizing maps : The typology result (7) The South East of the map concerns workers Unemployment Unskilled manual workers Skilled manual workers
6-2 Self organizing maps : The typology result (8) The South West of the map concerns executives Executives and intermediate occupations Studies
6-2 Self organizing maps : Another application • More information and an application to career paths during the 7 years after leaving school • Rousset P., Giret J-F. (2008) A Longitudinal Analysis of Labour Market Data with SOM, Encyclopedia of Artificial Intelligence, Edition Information Science Reference • Rousset Patrick, Giret Jean-François , Classifying qualitative time series with SOM : the typology of career paths in France, Computational and ambient intelligence. 9th international work-conference on artificial neural networks, IWANN 2007 San Sebastian, Spain, June 20-22, 2007 Proceedings. - Edition : Berlin, Springer, 2007, pp. 757-764
6-2 Self organizing maps : Exogenous dimensions • A pie indicates the repartition of clusters in the items of an exogenous variable 35% of class 1 at the level “Baccalauréat” 35% of class 1 at the level “Vocational education” Projection of exogenous variable : The diploma
6-2 Self organizing maps : Projection of other methods • The bars indicate the repartition of cluster obtained with an exogenous classification • SOM can be used as a support of representation of • any classifications • Comparison of classifications 50%, resp. 25%, of class 6 are in units 7 and 14 Projection of other classification results: Correspondence and hierarchical method (8 clusters)
Conclusion • Comparison between methods • Complicated by the different combinations of choices at the different stages of building an empirical typology • Outputs can be compared visually (SOM could be helpful in this regard) • But quantitative measures are needed • Overall, highly similar groups were found • Substantially, a non-skilled job is almost never a springboard for getting a skilled one: Only 1/3 reached a skilled position at the end of the period • Studying what can appear at first sight as discrepancies is a promising avenue of research in the understanding of the kind of patterns each method is the most sensitive to
Clustering algorithms. • Classical : • - Hierarchical : • A grouping at each level and a representation with a tree. • - K-means, simple competitive learning : • Number of clusters is fixed, adapted to large data. • Neural networks : • - Multilayer perceptron. • Supervised learning. • Self-organizing maps : • Number of clusters is fixed, adapted to large data. • - Algorithm of Kohonen. • The structure of the map is fixed, system of representation specifically adapted. • Neural Gas. • Learning of the map structure, no specific system of representation.