290 likes | 471 Views
simario. A discrete-time dynamic micro-simulation framework for R. Glossary. Micro-unit: The unit of analysis being simulated, eg : child, patient etc. simario features. Perform dynamic simulation, i.e.: transform a single set of micro-units transformations can occur each iteration
E N D
simario A discrete-time dynamic micro-simulation framework for R
Glossary • Micro-unit: • The unit of analysis being simulated, eg: child, patient etc.
simario features • Perform dynamic simulation, i.e.: transform a single set of micro-units • transformations can occur each iteration • transformations include results from • logistic, binomial, Poisson, negative binomial, and normal regression models with coefficients specified via a file • transformations according to discrete probabilities specified in code or a file • Generate descriptive statistics from the results of each iteration including frequencies, means, quantiles, and summaries • Statistics can be generated for the whole population, or subsets • Statistics can be grouped by base variables (ie: variables that don’t change during the simulation) • Perform multiple simulation runs and average tracked descriptive statistics across multiple runs • Scenario testing via the modification of simulation variables so the flow-on effects can be observed • continuous variables can be modified before the simulation begins • categorical variables can be modified before the simulation begins, or during the simulation for specific iterations • Available as an R package
Global environment variables • A simulation will begin by initialising the R global environment • All simulations must have at least one simulation environment (Simenv) • If required, the simulation will also initialise in the global environment: • models • list of generalized linear models. These contain model equations, i.e.: variable names and their coefficients. • propensities • list of propensity arrays used for categorical adjustment
Simframe environment (Simenv) • A simulation environment contains everything required to perform a simulation. • Typically one Simenv will be created and used to run a base simulation, and additional Simenvs will be created to test different scenarios.
Simulation environment variables • name • num_runs_simulated • incremented during the simulation process • dict • the data dictionary for the whole simulation • simframe • A dataframe of variables (input, intermediate, and outcome) used during the simulation • cat.adjustments • adjustments applied to categorical variables before and during the simulation • presim.stats • stats generated after adjustment but before simulation begins. Typically these will be descriptive statistics of input variables that don’t change eg: gender, ethnicity • modules • one or more simulation modules (Simmodule) which contain code and results for a discrete part of the simulation
Data dictionary • Contains descriptions, and codings. Both are loaded from a file
Data dictionary variables • descriptions • a vector of variable descriptions, named with variable names • eg: c(“age”=“Age”, “health”=“Health status”, “earnings”=“Earnings to date”) • codings • a list of category names for categorical variables • eg: list(health=c(“Good health”=1, “Bad health”=2), gender=c(“F”=0, “M”=1))
Categorical adjustments • The user may wish to specify the proportion of category values desired for a simframe variable – pre-simulation and/or for specific iterations • Eg: a user may wish the proportion of home owners in year 2 to be 0.4, 0.6 • Desired proportions can be specified in a categorical adjustment matrix, eg:
Categorical adjustments • In the above example instead of simulating the home ownership variable in year 2, it will be set to the desired proportions 0.4, 0.6 • A desired proportion of NA will leave the variable unchanged • If propensities are supplied they will be used to select which micro-units to adjust, otherwise the selection will be random • Propensities are specified via propensity arrays and stored in the global list variable propensities
Continuous adjustments • The user may wish to specify values desired for a continuous simframe variable – pre-simulation and/or for specific iterations • Eg: fixed simulation values for mother’s hours worked may be specified for micro-units • Desired final values can be specified in a continuous adjustment matrix; NA will leave the value unchanged
Propensity arrays • consist of: • rows - the individual micro-units • cols - categories, with one less column than the total number of categories • z dim - iterations/years
Simulation modules (Simmodule) • A simulation module is really the core of a simulation and where all the work is done • It contains the code and output for a distinct set of results generated, eg: health outcomes for years 1 - 10.
Simmodule functions • simulateRun • transforms the simframe through multiple iterations. Produces outcomes at each iteration. • appendRunStats • generates run stats, eg: frequencies, means, quantiles, and summaries for any outcome and iteration. Stores them with stats from previous runs. • collateRunStats • calculates the mean of run stats over multiple runs and prepares results for display by adding column names etc. • These functions will be explained in more detail later
Simmodule variables • outcomes • results of simulateRunfor the most recently simulated run. • overwritten by subsequent runs. • a list of outcome matrices • runstats • results of appendRunStats • a list of run stats. Each element contains runstats for all runs. • runstats.collated • results of collateRunStats • Each element contains runstats averaged across all runs
Simframe • A dataframe of variables (input, intermediate, and outcome) used during the simulation • Input variables are those that are not transformed • Intermediate variables are transformed during simulation iterations, but not recorded for output • Outcome variables are transformed during simulation iterations, and are stored in outcome matrices for generation of run stats after each run • Each variable is a vector that contains values for all micro-units
The master simframe • The master simframe is initialised from a simframe definition file. • Each simframe variable is populated with initial values, as specified by the definition file. • Initial values may come from the global environment, or from a supplied dataframe, eg: a dataframe loaded from a base file • The master simframe is typically stored in the global variable simframe.master • After loading the master simframe is not modified
Simframe definition file • Varname: • the name of a variable in the simframe • Previous_var • the name of a variable in which to store the current value in at the beginning of each iteration (i.e: before it's transformed). • Optional - for models that require previous state. • Initial_value • an expression that generates the initial value of the variable. Typically this expression will reference values in a previously loaded basefile. • Outcome_type • if specified, indicates this is an outcome variable and indicates its type which is one of “categorical” or “continuous” • Outcome_module • if specified, indicates the Simmodulethis outcome variable belongs to
One simframe per environment • When a simulation environment is created, it takes a copy of the master simframe • Before simulating, a environment’s simframe may be modified to test a particular scenario
The simulation process • Categorical adjustments for iteration 1 are applied to the simframe. The adjustments applied are specified in the cat.adjustments variable. • Other adjustments that might need to be performed, such as adjustments to continuous variables, can be done outside simario by the user before simulation. • Pre simulation stats are generated • Run loop • During each run, the following functions are called on each module • simulateRun • appendRunStats • Final results are calculated by each module across all runs (using the calcFinalResults function)
simulateRun • Simulation involves calling simulateRun() for each Simmodule • The simulateRun() function uses a local copy of the simframe, copied from the environment’s simframe. The environment’s simframe is not modified. This is so that each run will start with the same simframe. • At the beginning of each iteration, the current values of specified simframe variables are stored in corresponding “previous” variables. (This optional feature can be used by models that rely on previous state information to generate the current state). • During the iteration, simframe variables are transformed by transition probabilities, models or set to desired categorical adjustment proportions • At the end of the iteration, simframe outcome variables are stored in outcome matrices • A list containing all outcome matrices is stored in the outcomes variable of the Simmodule
Initial state – environment simframe Initial state – outcome matrices created Store values in outcomes Store values in outcomes Iteration 2 Iteration 1
appendRunStats • At the end of each run a set of run stats is calculated for outcomes • A run stat is any value from any outcome that you wish record and track across multiple runs • Typically a run stat is an aggregate value calculated across each iteration (eg: a mean for each iteration) although it could be a result from a specific iteration • Example of run stats include: • a single value, e.g. mean • a vector, e.g. frequencies, quantiles, summary • a matrix, e.g. 2 way table • Run stat functions are available to produce • frequency tables (for both categorical and continuous variables) • means, summaries (ie: the R summary function), quantiles
collateRunStats • Prepares run stats for averaging by first transforming, e.g.: • Turing frequencies into percentages • Removing unwanted results • Labelling columns • Takes the average of run stats across all runs