340 likes | 534 Views
LIAM 2: A NEW OPEN SOURCE DEVELOPMENT TOOL FOR DISCRETE-TIME DYNAMIC MICROSIMULATION MODELS. GIJS DEKKERS ( FEDERAL PLANNING BUREAU, Brussels, CESO, K.U.LEUVEN AND CEPS/INSTEAD, Luxembourg ) PHILIPPE LIÉGEOIS (CEPS/INSTEAD and DULBEA, ULB, Brussels). IMA 4th General Conference, Canberra
E N D
LIAM 2: A NEW OPEN SOURCE DEVELOPMENT TOOL FOR DISCRETE-TIME DYNAMIC MICROSIMULATION MODELS GIJS DEKKERS (FEDERAL PLANNING BUREAU, Brussels, CESO, K.U.LEUVEN AND CEPS/INSTEAD, Luxembourg) PHILIPPE LIÉGEOIS (CEPS/INSTEAD and DULBEA, ULB, Brussels) IMA 4th General Conference, Canberra Dec 9th, 2013
general motivation of the training session Introduce LIAM2, a free, open source, user-friendly modelling and simulation framework Basic functionalities, by examples and practice (“learning-by-doing”) : when back home, being ready to use LIAM2 and entering a process of elaborating own developments (MSM model) Other more advanced topics, latest developments, by examples : making you aware of (new) possibilities or technical difficulties Documentation : slides (just gathering essential information by topic), some examples and “UserGuide” (release 0.7.0) included in the LIAM2 bundle (http://liam2.plan.be) + Google group Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Why LIAM2 and Where does it come from ? Most existing microsimulation models have been developed by separate (teams of) researchers. The drawback of each team working on its own is that they have to put a lot of time and effort in the customary development simulation tools… which makes microsimulation models even more expensive than strictly necessary. Furthermore, as modellers often are not professional programmers, the result is not necessarily the most efficient in terms of simulation speed. This is the reason why several partners joined their efforts to develop a dynamic Microsimulation modeling toolbox (“LIAM2”) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Why LIAM2 and Where Does It Come From ? Initialized through a collaboration between the Federal Planning Bureau in Brussels (development), the CEPS/INSTEAD and the General Inspectorate of Social Security in Luxembourg (testing and complementary funding) and Cathal O’Donoghue (LIAM and expertise) as well as other experts, under European funding (MiDaL Project 2009-2011, PROGRESS programme, Grant VS/2009/0569, CEPS/INSTEAD) Most of the technical job for LIAM2 done in Brussels (Gaëtan de Menten, Geert Bryon, Raphaël Desmet and Gijs Dekkers) Open source, User-friendly and Efficient : • A clear separation between “modellers” (responsible for the modelling) and “programmers” (in charge of the development of critical methodological issues, including state-of-the-art methods for data-handling and simulation optimization) • Implementation of language which is easy to use for the modellers Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Contents of the Training Session Getting started with LIAM2 A rudimentary model, creating objects and some output Linking Objects Stochastic simulation Marriage market (Matching function) Importing data towards LIAM2 input format Advanced topics Conclusions, including building a model with LIAM2 Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start– Model – Links – Stochastic – Matching – Import – Others 1. Getting started with LIAM2 Just download and use it (demonstration) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2. A Rudimentary Model - 2.1 The bricks LIAM2 involving several kinds of « bricks », among which… ENTITIES : objects (persons, households, firms, cell, …) with a unique identifier FIELDS : attributes of an entity (e.g. person’s age) LINKS : relation between entities (e.g. person’s children) ; can lead to subsequent use (e.g. spouse.mother.age) GLOBALS : a parameter not related to a specific entity, may vary through time (e.g. CPI) PROCESSES : assignments, which change the value of a variable (e.g. « age+1 ») using an expression, and actions which do not (e.g. remove dead person) MACROS : piece of code, re-evaluated each time it is referenced (e.g. « WIDOW: civilstate == 5) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.2 Overall structure of a Model A model is typically composed of 3 main blocks… « globals » « entities » : including their fields, links and definition of processes (order not meaningful) and macros available for that kind of entity « simulation » : including the general setup of the model, e.g. input, output, starting period of simulation, number of periods of simulation, etc Within the blocks, indentation is meaningful (cf. YAML-markup language) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
globals: periodic: - CPI: float # Consumer Price Index - ... entities: household: ... person: fields: - age: int - gender: bool# “0” if female, “1” if male - m_id: int# mother’s identifier - ... macros: FEMALE: gender == 0 ... links: # first one : Mother to Children mc: {type: one2many, target: person, field: m_id} ... processes: age: "age + 1" ... divorce: "..." # the process is here DECLARED/SPECIFIED only ... simulation: processes: - person: [ year, age, ... divorce, # the process now SIMULATED/used ... ] input: path: "INPUT_DATA" file: "INIT_MODEL_LXG_INPUT_2007.h5" # “HDF5” format output: path: "OUTPUT_DATA" file: "FINAL_MIDAS_LXG_RESULTS.h5“ # “HDF5” format start_period: 2008 # first simulated period periods: 20 • LXGMIDAS MODEL(2012) • Aglobal viewfirst… Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
MIDAS_LU - 1stPart : declaring objects… • # • # LUXEMBOURG MIDAS MODEL • # FINAL VERSION, AS ON 14 MAR 2012 • # • globals: • periodic: • - CPI: float # Consumer Price Index • - ... • entities: • household: • ... • person: • fields: • - age: int • - gender: bool# “0” if female, “1” if male • - m_id: int# mother’s identifier • - ... • macros: • FEMALE: gender == 0 • ... • links: # first one : Mother to Children • mc: {type: one2many, target: person, field: m_id} • ... • processes: • age: "age + 1" • ... • divorce: "..." # the process is here DECLARED only • ...
… then 2nd Part : simulating • … • simulation: • processes: • - person: [ • year, • age, • ... • divorce, # the process now SIMULATED/used • ... • ] • input: • path: "INPUT_DATA" • file: "INIT_MODEL_LXG_INPUT_2007.h5" # “HDF5” format • output: • path: "OUTPUT_DATA" • file: "FINAL_MIDAS_LXG_RESULTS_YEARS_2007_2009.h5" • start_period: 2008 # first simulated period • periods: 2 Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.3 Input Data and Structure D1 Demonstration Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.4 Outputting Outcomes (A) dump(), groupby(), show(), qshow(), csv(), … dump() ~ « list » Syntax: dump([expr1, expr2, ..., filter=filterexpression, missing=value, header=True] Example : Run the model (e.g. bundled Notepad++ editor => F6) dump ( id, hh_id, age, gender, filter = id<20 ) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.4 Outputting Outcomes (B) groupby() ~ Summary tables Syntax: groupby(expr1[, expr2, expr3, ...] [, expr=expression] # « count() » by default [, filter=filterexpression] [, percent=True], …) Example : groupby ( trunc(age/10), gender ) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.4 Outputting Outcomes (C) show() qshow() Syntax: show(expr1[, expr2, expr3, ...]) Example : show ( count(age>=18) , avg(age, filter = age>=18 ))) NB : “show” is implicit in console environment itself NB2 : qshow() is equivalent to show() but with textual form in output Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.4 Outputting Outcomes (D) csv() ~ to « .csv » files Syntax: csv(expr1[, expr2, expr3, ..., [suffix=’file_suffix’][, fname=’filename’][, mode=’w’ by default / ’a’]) NB : default « fname » is « {entity}_{period} » (e.g. « person_2003.csv ») Example : csv ( avg(income) , suffix=’income’) (e.g. « person_2003_income.csv ») Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 2.5 The "init" phase of simulation Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others NB : About variables in LIAM2 - Local/Temporary Most often, variables are declared in « fieds » section of an entity : they may be assigned a value through processes and will be stored in the output file But often, you need a variable only to store an intermediate result : simply make an assignment to an undeclared variable Example : person: fields: # period and id are implicit - age: int - agegroup: int processes: age: age + 1 agediv10: trunc(age / 10) agegroup: agediv10 * 10 agegroup2: agediv10 * 5 Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others D2 3. Linking Objects - (A) Many2One Links are of « many2one » or « one2many » types « many2one » : linking many objects to one only (e.g. children to their mother or members of household to HH) Syntax link_name: {type: many2one, target: <entity>, field: <name of link field>}NB : “field” logically targeting the “single” side of link Examples (defined within entity : « person ») household: {type: many2one, target: household, field: hh_id} Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 3. Linking Objects - (B) One2Many « one2many » : linkingone objectto severalothers(e.g. a household to membersof household) Syntax (idem many2one) link_name: {type: <type>, target: <entity>, field: <name of link field>} NB : “field” logically targeting the “single” side of link Examples (defined within entity : « person ») persons: {type: one2many, target: person, field: hh_id} Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 3. Linking Objects - (C) Getting Info out of Linkage For accessing the information through links, just show the route and « chain » steps towards the target : Syntax (basic) link_name.field_name Examples mother.age but also mother.mother.age Aggregate information on one2many context (in « household ») persons.avg(age) but also persons.count(age<=17) Aggregate information on many2one context (in « person ») household.get(persons.avg(age)) mother.get((mother.age+father.age)/2) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others D2 & 3 4. Introducing Stochastic Simulation4.1 The “Choice” function (A) Up to now, deterministic approach : no random component Imagine on the contrary an event happening with exogenous probability p : which gender while just born, but also who is dying, marrying, getting a job, etc The simplest way to « decide » while simulating which will be the outcome of the random event for a given entity is to draw a random number u from an uniform [0,1] distribution. If u > p, then the entity is experiencing the event (e.g. dying). The same methodology can be applied if a choice between more than 2 options (e.g. attributing a household to a region within the country) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 4.1 The “Choice” function (B) This is exactly what is aiming to produce the “choice” function in LIAM2. Suppose i=1..n choice options, each with a probability pi. A choice expression then has the following form: Syntax choice([option_1, option_2, ..., option_n], [prob_option_1, prob_option_2, ..., prob_option_n])} Example gender_just_born: choice([True, False], [0.51, 0.49]) education_level: choice([1,2,3], [0.1, 0.4, 0.5]) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 4.2 Logit Regression (A) Sometimes, we know more about the event which probability to happen may rely on entity’s (personal) characteristics to be made more explicit through e.g. a « logit regression » if a qualitative (dichotomous) choice. e.g. a married couple may be less at risk of divorce if the duration of marriage is longer up to the present period Formally, if « α + βX » is the combination of characteristics xi which result from a logit regression and « explain » correctly the event under study, then : • First, a « logit score » (which is a probability) can be computed :logit_score(α + βX) = logistic(α + βX - logit(u))where « u » is a random number from an uniform distribution [0, 1]( logit(u) = log (u/(1-u)) and logistic(z) = logit-1(z) ) • Second, the decision rule can be the following : if the logit score > 0.5, then the event is happening (e.g. dying), otherwise not. Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 4.2 “Logit_score” and “logit_regr” Functions (B) In LIAM2, the first task, computing a score or risk, is performed thanks to the logit_score(expression) function which then returns a probability p = logistic(expression - logit(u)) NB : logit_score(0.0) is equivalent to uniform(), then returns a value > 0.5 with probability ½ (prob<1/2 if “<0.0”) Another function, logit_regr(),is performing both tasks at once and returns a boolean (True if the event is happening) Syntax logit_regr(expression, [, filter=conditions], …)Examples death: logit_regr(-0.5 + 0.02 * age, filter = age>40) NB : logit_regr(0.0) returns True with probability 0.5logit_regr(<0) returns True with probability <0.5 Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 4.3 Logit and State Alignment Alternatively, rather than grouding the decision on the threshold 0.5 for the probability p, we can decide to « select » a proportion of entities (by category) which must experience the event : this is « alignment » The logit_regr syntax encompasses alignment possibilities : Syntax logit_regr(expression, [, filter=conditions] [, align=proportions]) Examples divorce: logit_regr(0.6713593*household.nb_children - 0.0785202*dur_in_couple + 0.1429621*agediff, filter = ISFEMALE and ISMARRIED), align = 'al_p_divorce.csv') Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others NB : (Stochastic Simulation) Logit Extensions A simple use of logit_expr() function is equivalent to a « choice » process : Example dead: if(ISMALE, logit_regr(0.0, align='al_p_dead_m.csv'), logit_regr(0.0, align='al_p_dead_f.csv')) The logit_expr() function can be split into 2 steps logit_score() and align() in LIAM2, which may make the whole process more flexible (e.g. take): Syntax align(score, proportions [, filter=conditions] [, take=conditions] [, leave=conditions] …) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others 4.4 Other Regressions Continuous Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching – Import – Others D5 5. The “Marriage Market” (“Matching” function) Marriage market) matches individuals from set 1 with individuals from set 2. For each individual i in set 1 following a particular order (defined through an “orderby” parameter) : • a scoreis computed for all (unmatched) individuals in set 2 and • the best scoring member from set 2 is chosen for the match with i Syntax matching( set1filter=boolean_expr,set2filter=boolean_expr,orderby=<expression>, score=coef1 * field1 + coef2 * other.field2 + ...) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching– Import – Others 6. Importing Data Towards HDF5 Format Demonstrating Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching– Import – Others 7. Advanced Topics Arrays new & clone align abs tips & tricks Common mistakes Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching– Import – Others A Note About “new”, “remove” and “clone” Functions Entities (persons, households) may need to be created from scratch or removed while proceeding For example : • if a marriage, the 2 partners are forming a single household, which means that one of them at least is leaving his/her former household • If a birth, then a new person ; when dying, a person is removed from the population • Sometimes, we may need to create a “clone” of an existing entity Syntax and Examples (NB : treatments needed with links, etc) new(’entity_name’[, filter=expr][, number=value] *set initial values of selected variables*) clone(filter=new_born and is_twin, gender=choice([True, False], [0.51, 0.49])) remove(dead) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching– Import – Others 8. Conclusions Including Implementing a model in LIAM2 Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra
Start – Model – Links – Stochastic – Matching– Import – Others Implementing a model in LIAM2 : the full path (A-D) A] Structuring the model,depending on OBJECTIVES, starting e.g. from MIDAS_BE : demography, activity status, tax-benefit => variables, parameters (« GLOBALS ») & alignments needed B] Building Micro data(cf. variables)& Macro_based data (cf. alignments & parameters) (e.g. from EUROMOD, survey and/or adm. data D] Running, Debugging, Outputting, Validating C] Building and estimating Behavioral equations (e.g. probability of divorce) Gijs Dekkers and Philippe Liégeois - TRAINING LIAM2 - IMA Conference 2013 - Canberra