120 likes | 260 Views
Do we want a Synthetic or Cross-sectional population approach to simulation modeling?. Overview. Opening Questions Simulation approaches Advantages and Challenges example of OA prevalence example of BMI How to convert POHEM-OA from CCHS approach to Synthetic?
E N D
Do we want a Synthetic or Cross-sectional population approach to simulation modeling?
Overview • Opening Questions • Simulation approaches • Advantages and Challenges • example of OA prevalence • example of BMI • How to convert POHEM-OA from CCHS approach to Synthetic? • How to convert CRM cancer models from Synthetic to CCHS approach? • Discussion
Opening Questions • do we want a synthetic or cross-sectional population approach to simulation modeling? • do we believe that a model must fit historical data to be useful ? believable ? ( e.g. smoking lung cancer ) • if synthetic • do we insist on reproducing (joint) distributions found in cross-sectional surveys like CCHS 2001 populations? • how high do we want to set the bar? how closely do we need to represent the individuals on CCHS to claim success • eg age*sex*province*bmi*smoking*HUI* CC*... • how do we structure validation/calibration to succeed? (incremental) • if cross-sectionally initialized population (CCHS) • are we comfortable imputing histories ? (eg smoking) • do we feel more comfortable with our projections? • Is one approach or the other more easily communicated to decision-makers? other researchers? • Which approach has more face-validity, that could possibly lead to wider acceptance and use?
Simulation Approaches • Synthetic Birth Cohort • Early POHEM cancer and HD models • Synthetic Multi-cohort from Birth • Lifepaths, POHEM cancer screening • Cancer Risk Management Model • CCHS Multi-cohort from Cross-section • POHEM-OA++ • POHEM-PA++ (PHAC) • POHEM-OBESITY (PHAC) ( proposal) • POHEM-HEBIC (PHAC) (proposal) STAR
Synthetic Birth Cohort Age 0 sex province Models that use this approach: Early POHEM lung and breast cancer • Captures realistic, recent distributions of risk factors for recent period • No attempt to reproduce joint % in CCHS 2001 Outcomes: Lifetime costs Average duration in state LE What-if scenario outcomes: Change in LE • Sub-Models : • Initialize risk factors (eg at age 15) • Risk factors change as person ages • eg. joint distribution of chol * bp * diab * bmi • Incidence by age, sex • Progression (survival)
Synthetic Multi-cohort Year of birth Projected births Age Projected immigration Models that use this approach: Lifepaths Early POHEM-CRC, Preventive tamoxifen Cancer Risk Management Model
CCHS Year of birth CCHS Projected births Age Projected immigration Coherent set of variables for each individual age, sex, province, education, income, BMI value, smoking, HUI, SROA, … Consider BMI: initial value of BMI self-reported in CCHS project BMI : models of change in BMI in individuals from NPHS (1996-2004) • Models that use this approach: • POHEM-OA (BMI, CCORT-AMI, DPORT-Diabetes) • POHEM-PA (BMI, HD, Cancer, Diabetes, Mortality)
Synthetic Multi-cohort Year of birth Projected births Age Consider BMI Challenge: initialize BMI (at age 18) Solution: obtain distribution of BMI (at age 18) from various historical data sources (like what we have done for smoking) Projected immigration ? ? Challenge: projection of BMI (after age 18) Solution: use the NPHS regression models Remark: not valid across all periods / simplifying assumptions Challenge: not likely to match BMI% in CCHS 2001 Solution: calibration ( no unique solution, arbitrary ) Challenge: joint % of BMI with everything else (smoking, SROA, HUI, age, sex…) Solution: calibration ( no unique solution; very challenging / time consuming) Remark: Is a model useful that does not meet this level of “correctness”?
CCHS Multi-cohort History Future CCHS Advantage: Coherent set of variables for each individual Challenge: no (or little) history available on survey Solution: use survey information (eg age at start smoking), other data source, or extrapolate/infer Example: what is progression of OA in prevalent cases? Solution: effectively impute history by using incidence to progression survival curves, draw random point on progression pathway age, sex, province, education, income, BMI value, smoking, HUI, SROA, … Challenge: sharing CCHS file Solution: limit access, use public file Challenge: missing children and youth Solution: use NLSCY, NPHS, (CCHS12+) Challenge: self-report data inconsistent with other data sources Example: prevalence of self-reported OA in 2001 is inconsistent with administrative definition of OA prevalence Solution: match (tag) admin OA prevalence to SROA individuals as much as possible • Validation: • Start with CCHS 2001 and validate against external datasets include CCHS 2003, 2005, 2007… and CHMS 2011 • Start model with NPHS 1996… but do we expect NPHS and CCHS to agree? (not a simulation question)
How would we convert POHEM-OA from CCHS to Synthetic approach? • OA Incidence rates • calibrated to 2001 by age and sex, based on distribution of BMI in CCHS 2001 • “reference” rates associated with normal weight • assume reference rates apply in all previous years or apply trend • easy • OA Prevalence rates • not required, generated from incidence rates • Progression of OA • survival curves still work • HUI • initialize HUI to 1.0 at age 0, apply model of change in HUI as it is • unlikely to reproduce distribution of HUI observed in CCHS– calibrate ? • BMI • initialize historically (at age 18) • apply models of change in BMI starting at age 18 • unlikely to reproduce 2001 CCHS distribution of BMI • calibrate ? • Everything Else • AMI, diabetes, smoking, joint distribution of cardiac factors… • similar issues
How would be convert CRM cancer models into CCHS approach? • how do we assign cumulative history of smoking (radon) in 2001? • age at smoking initiation on CCHS? • impute from other survey or data source? • impute from CRM (by current smoker status, sex, province, and others) • cancer incidence calibrated to 2005 • could recalibrate to 2001 • assign cancer prevalence in 2001 (same challenges as for OA) • cancer screening should be transparent (?) • treatment conditioned on incidence, progression, so ok • what about interprovincial migration, educational dynamics, income dynamics? • needs investigation, probably feasible, possibly quite easy
Discussion • do we want a synthetic or cross-sectional population approach to simulation modeling? • do we believe that a model must fit historical data to be useful ? believable ? ( e.g. smoking lung cancer ) • if synthetic • do we insist on reproducing (joint) distributions found in cross-sectional surveys like CCHS 2001 populations? • how high do we want to set the bar? how closely do we need to represent the individuals on CCHS to claim success • eg age*sex*province*bmi*smoking*HUI* CC*... • how do we structure validation/calibration to succeed? (incremental) • if cross-sectionally initialized population (CCHS) • are we comfortable imputing histories ? (eg smoking) • do we feel more comfortable with our projections? • Is one approach or the other more easily communicated to decision-makers? other researchers? • Which approach has more face-validity, that could possibly lead to wider acceptance and use?