210 likes | 387 Views
LIAM2*. Gijs Dekkers Federal Planning Bureau and Katholieke Universiteit Leuven Sides made by Gaëtan de Menten. A sneak preview. *Life-cycle Income Analysis Model.
E N D
LIAM2* GijsDekkers Federal Planning Bureau and KatholiekeUniversiteit Leuven Sides made by Gaëtan de Menten A sneak preview... *Life-cycle Income Analysis Model Paper presented at the Paper to be presented at the « Tresury Brown Bag Lunch Meeting », Ministerodell'Economia e delleFinanze, Rome, February 14th, 2011
LIAM 2: the foundations • LIAM by CathalO’Donoghue • Used in AIM-project, developing MIDAS for Belgium, Italy and Germany. • Updating, extending and considerable problem solving by Geert Bryon (FPB) • PROGRESS-MiDaL project (Grant VS/2009/0569) • FPP (Be): development, application and testing • Gaëtan de Menten: development • Geert Bryon: application and testing • GijsDekkers: management, data and a bit application • CEPS/INSTEAD (Lux): testing • IGSS (Lux): investment, testing • CathalO’Donoghue (Teacasc, Ire), Howard Redway (minstry of work and pensions, uk): comments and conceptual assistance
Overview of this sneak preview • Current features • Current performance • Demonstration • TODO(?)
Current features • Input • Simulation: a text file • Alignment: CSV files • Initial field data: an hdf5 file • Output • hdf5 file • Converters • Old data format (Tab-separated text files) <-> hdf5
constants per_period: • PARAMETER-NAME: float entities entity-name1 (e.g. Household): fields links processes entity-name2 (e.g. Person): fields links processes simulation: init: processes: • entity: [list of processes, separated by commas] input: path: “path name” file: “file name.h5” output: path: “path name” file: “file name.h5” start_period: periods: The setup of a model
Current features • Language • Python • High level, concise, readable, easy interface with C • Lots of 3rd party libraries (especially scientific tools) • But uses some efficient (open source) libraries written mostly in C • Numpy • Numexpr • PyTables
Current features • Can declare “fields” with a type • float, int, bool • Evaluate simple expressions • Arithmetic operators: +, -, *, /, **, % • 0.51 * age + 0.023 * age ** 2 – 0.0012 * age ** 3 • Comparison operators: <, <=, ==, !=, >=, > • age < 20 • Boolean operators: and, or, not • not male and (age >= 15) and (age <= 50) • Conditional expressions: if(condition, iftrue, iffalse) • if(age < 65, earnings, pension)
Current features • Store fields • for each period (if the field is declared) • age: “age + 1” • as temporaries (the value is lost after each period) • ischild: “age < 18” • Macros (re-evaluated wherever they appears) • ISCHILD: “age < 18” • difference with temporaries: • ischild: "age < 18" • before1: “if(ischild, 1, 2)" • before2: “if(ISCHILD, 1, 2)“ # before1 == before2 • age: "age + 1" • after1: "if(ischild, 1, 2)" • after2: "if(ISCHILD, 1, 2)“ # after1 != after2 !! # after1 == before 1
Current features • Functions • Per individual • abs, log, exp • clip • 0.25 * clip(age ** 3, 0, 100000) • round • round(age / 10.0, 2) • min/max • min(age, 99) • max(pension, benefit)
Current features • Functions • Aggregate functions • grpcount, grpsum, grpavg, grpstd, grpmax, grpmin • abs(age - grpavg(age)) • Normal: random numbers with a normal distribution • normal(loc=0.0, scale=grpstd(errsal)) • Some functions accept a filter argument • abs(age - grpavg(age, filter=male), filter=not male)
Current features • lag/value_for_period • Only simple expressions and explicitly saved aggregates for now • value_for_period(inwork and not male, 2002) • lag(sum_twr) • matching: match two sets of individuals (aka Marriage market) • matches individuals from set 1 with individuals from set 2 • follow a particular order (given by an expression) • for each individual in set 1, computes the score of all (unmatched) individuals in set 2 and take the best scoring one • matching(set1filter=to_marry and not male, set2filter=to_marry and male, orderby=difficult_match)
Current features • Many-to-one links • partner.age • grpavg(partner.age – age) • partner.father.age • partner.get(earnings + benefits)
Current features • One-to-many links • countlink(link, filter) • countlink(persons) • countlink(children, age < 18) • sumlink(link, expr, filter) • sumlink(persons, earnings, age >= 18) • avglink(link, expr, filter) • avglink(children, age, not male) • minlink/maxlink(link, expr, filter) • minlink(children, age, not male)
Current features • Regressions • Logit • logit_regr(expr, filter, align) • Continuous (expr + normal(0, 1) * mult + error) • cont_regr(expr, filter, align, mult, error_var) • Clipped continuous (always positive) • clip_regr(expr, filter, align, mult, error_var) • Log continuous (exponential of continuous) • log_regr(expr, filter, align, mult, error_var) • Alignment • Fixed percentage or 2 dimensional table in a csv file
Current features • Lifecycle functions • new: create new individuals • new('person', filter=to_give_birth) • remove: remove individuals from the dataset • remove(dead) • remove(nb_persons == 0) • Miscellaneous functions • show: print anything to the console • show(grpcount(age >= 18)) • show(grpcount(not dead), grpavg(age, filter=not dead))
Current features (9/10) • Miscellaneous functions • dump: produce a table with the expressions given as argument • show(dump(age, age / 10, filter=id < 20)) • groupby (aka “pivot table”): group individuals by their value for the given expressions, and optionally compute an expression for each group • show(groupby((age / 10, gender))) • show(groupby((agegroup, gender, inwork), grpcount())) • show(groupby(agegroup, grpavg(income))) • show(groupby((inwork, gender), id, filter=age < 10) • csv: write a table to a csv file • csv(dump(age, age / 10, gender), suffix=‘age’) • Show: interactive assessment of results: command line
Current Performance • For a simple model: • birth (using alignment data from MIDAS) • chronic ill (using a fixed percentage alignment) • marriage market • earnings (using macro alignment) • Or at least what I think macro alignment is... • death (using alignment data from MIDAS)
Current Performance • 10,000 persons, 20 periods • 2,65s (on a Dell latitude laptop computer) • 100,000 persons, 20 periods • 29s • 1,000,000 persons, 20 periods • 16 minutes 31s, of which approx. 83% is spent in the marriage market • ~180Mb RAM • 897Mb output file • could be compressed if needed • For a complete model with 100,000 persons • probably under 10min
TODO • Automated tests (aka “unit tests”) • Documentation • User manual • Code • Speed optimizations • Clean-up the code