290 likes | 439 Views
Sylvie Huet. Modelling from data: an experience in modelling rural demography. Laboratoire d’Ingénierie pour les Systèmes Complexes. From data to models Cergy-Pontoise, 27-28 june 2013. Context: demography in rural municipalities. Evolution du rural in Europe?
E N D
Sylvie Huet Modelling from data: an experience in modelling rural demography Laboratoire d’Ingénierie pour les Systèmes Complexes From data to models Cergy-Pontoise, 27-28 june2013
Context: demography in rural municipalities Evolution du rural in Europe? Coupling demography and residential mobility of people in order to study their evolution at a very local scale: the municipality level
Context: demography in rural municipalities Decision-support in demography generally uses microsimulation modeling (O'Donoghue, C. (2001), Li and O’Donoghue, 2012). Space and residential mobility Coupling microsimulation and agent-based modelling No integrated theories so extracted and using data to build a globally coherent theory through the dynamic modelling approach at the individual level and at the municipality level A first instance on the Cantalpopulation (French region) (Huet et al 2012a, 2012b).
Problem An interesting motivation A well identified overall modelling choice A marvellous applied research question But data! As a constraint, as theories, as results… “Pas de chichis, pas de blabla, que des résultats”
Summary: everything through the prism of data Finding and censing data Choosing data for dynamicmodelling
Finding data Can’tbuilt a specificsurvey: too large problem Can’t use a reweightedsample of individual: not enough and toomuchdifficult to access tenacity… and then
Finding and make the census of data At first, we had nothing… and finally we have too much! Réseau chambre d’hôtes Finances Communales DGF Confusion Taxes de séjour Corinne Land Cover Recensements 1990, 1999, 2006, … Histoires familiales Histoire de vie 2003 Household Panel Base permanente des Equipements Tables de mobilité 1999 SIRENE entreprises Enquête logements Inventaire Communal 1998, 1998 Labour Force Survey SITADEL (logements) Enquêtes générations 1988, 1998, … Distribution des salaires (INSEE) Recensements agricoles 1988, 1998, 2005 Revenus des ménages ISSP sens du travail Enquête Emploi
Changing confusion in results DATA MODEL LINKING Criteria to choose
Summary Finding and censing data Choosing data for dynamicmodelling
Criteria to choose among all the data? Quantity of work People and ideas Building the various dynamics (and their couplings) Calibrating and validating the model
1. Criteria: time and cognitive costs! • The ones we don’t really talked about linked to the quantity of work • Cost in terms of investigation of the data sources • Easiness to use statistical tool and representativity • Possible reuse of generic objects and dynamics in other countries
Laborious, difficult, not valorised,… not publishable, not a research problem, too long to explain… What a costly approach! List of questions List of variables (not necessarily the direct answers to questions) List of modalities for a variables Representativity at various scales, for various population… Understanding hiden/above model, theories Require to study for every possible source: A lot of people always use the same survey as we use the same tools or the same methods
2. Criteria: working with people and ideas • In interdisciplinary work, the ones you don’t think a priori: • Understandable for involved people (and comparable with other models) • Working with research partners • A compromise to decide about • Or who you are going • not to understand
Criteria: working with researchers and ideas Why not to use the wages? • The existing/choosing data are not collected under their theories’ hypothesis: misunderstood, disagreement • Some, especially modellers, don’t use data usually • Some, especially modellers, have difficulties to understand what individual based modelling means
3. Criteria: building the various dynamics • To build the various dynamics (and their couplings) • Possible interconnectivity of various sources Example: using conjointly the LFS and the Census, giving both the “same” activity sectors and socio-professional category allowing to define the employment offer at the municipality level (Census) and the way an individual choose an employment and change it (LFS)
Criteria: building the various dynamics • Problem of the statistical representation (example of low density areas representing a small part of the population: 39% zones ruralesoupériurbaines) Example in Cantal: number of farmers in Cantal; no problem to access to a lodging but problem to access services) European Household Panel or National Census? Census: rare datasources at low level and rare theories and/or knowledge
Criteria: building the various dynamics starting from wrong data With the wrong data, in sense of irrelevant, not convenient, chosen for theircapacity to « reveal » a relevant dynamics The number of in and out migrants has this property since it links every processes related to mobility, starting from the decision to move
Choosing a decision to move: “checking model” Familyreasons are the mostcitedreasons for the decision to move (impact on needed size) 17075 17075 17025 17025 Old people move too much for a decision only based on the size of the current housing
Assessing the chosen decision to move LITTERATURE (statistical analysis from data) (Debrand and Taffin 2006) notice that moving decreases with age But also the move to a large housing is much more common than the move to a smaller one And finally we can also reproduce the critical values, and more simply, deciding to move with a lower probability when the need is to decrease the residence size
Choosing dynamics to ensure consistency (in case again of wrong data) Counterintuitive choices to ensure the consistency between endogenous submodels, being parameterised from calibration, and exogenous submodels, parameterised from data. Example: residential mobility modelling, people are susceptible to migrate out the region if and only if they have found a new residence place inside the region! => only because we only know about the probability to quit the region versus moving inside the region (ie problem of the unknown decision to move)
4. Criteria: calibration and validation • To calibrate (finding out the parameters of the dynamics chosen through the checking-model procedure) and validate the model: • Temporal continuity of the definitions and availability, comprising also the initial state (ex. : 1990, 1999, 2006, dwelling size…) • Relevance of the spatial scale at which the data are available • Critical indicators about the temporal evolution, especially related to “initially” unknown dynamics Example for Cantal…
The Cantal: data for calibration 2000-2006 1999 A DECREASING POPULATION BUT AN INCREASING MIGRATORY BALANCE (switchingduring the period) AND A DECREASING NATURAL BALANCE
The Cantal: data for calibration 2000-2006 WITH A LARGE HETEROGENEITY OF THE TENDENCIES AT LOCAL LEVEL decreasing municipalities: red increasing municipalities: blue
The Cantal: data for calibration 17025 11905 133459 2000-2006 1990-1999 2000-2006 116461 17075 A LOT OF MOVES DESPITE A WEAK MIGRATORY BALANCE 9814 WITH A STRONG SPATIAL CONSTRAINT 1999
An almost impossible calibration despite the data and because of the data Aim at respecting the tendency (not only the absolute difference to various measures of the time). What about a small overall distances if the tendency is not the same? A combination of every tendencies is almost impossible to obtain… Require a quasi continuous loop of rebuilding the model Small distance but badtendency
A never ending validation Too many data in a way… how choosing to restrict the validation process? I don’t know at this stage. Similarly to the calibration problem, you can’t be satisfied since you have a lot of data, almost all the data you have not retain for building the initialisation or the dynamics
Synthesis at this point of my study of what data brings into the dynamic modelling at low level of large systems Finally very difficult to use as a predictive tool even if microsimulation (built from data) are usually built for this reason and considered as reliable since it propose a consistent theory extracted from data Much more useful (probably even classical theoretical approach or discrete choice models) to learn about composing dynamics since they consider a lot of coupling dynamics (instead hypothesizing they are neglectable) : checking dynamics procedure Data challenges the interdisciplinary work (instead of simplifying)!