Parametric modelling of cost data: some simulation evidence

Parametric modelling of cost data: some simulation evidence Andrew BriggsUniversity of OxfordRichard NixonMRC Biostatistics Unit, CambridgeSimon DixonUniversity of SheffieldSimon ThompsonMRC Biostatistics Unit, Cambridge 2003 CHEBS Seminar, Friday 7th November

Parametric modelling of cost data: Background • Cost data are typically non-normally distributed, with high skew and kurtosis • Arithmetic mean cost is of interest to policy makers • Central Limit Theorem ensures sample mean is consistent estimator • Commentators have proposed parametric modelling of cost data to improve efficiency • In particular, Lognormal distribution commonly advocated • Alternatively, Gamma distribution is an increasingly popular choice

Parametric modelling of cost data: Choice of estimator • If data are Lognormal an efficient estimator of mean cost is: exp(lm+lv/2) • If data are Gamma distributed the maximum likelihood estimate of the population mean is the sample mean

Parametric distributions: Simulation experiment • Lognormal / Gamma distributions • Population mean was set to be 1000 • Five choices of coefficient of variation (CoV = 0.25, 0.5, 1.0, 1.5, 2.0) to define distribution parameters • Samples of five different sizes (n = 20, 50, 200, 500, 2000) drawn from each distribution for each CoV • 2 x 5 x 5 = 50 experiments • Bias, coverage probability and RMSE all recorded

Parametric distributions: Distribution sets

Parametric distributions: Estimated RMSE from simulations

Parametric distributions: Estimated coverage probabilities

Empirical cost distributions:Summary statistics for 3 data sets Raw cost Log transformed cost

Empirical cost distributions:Data set 1: CPOU Raw cost Log transformed cost

Empirical cost distributions:Data set 2: IV Fluids Raw cost Log transformed cost

Empirical cost distributions:Data set 3: Paramedics Raw cost Log transformed cost

Empirical cost data sets:Simulation results

Parametric cost modelling:Comments & conclusions • “All models are wrong” (Box 1976) • “No data are normally distributed” (Nester 1996) • Costs are estimated from resource use times unit cost • Any parametric assumption relating to costs is at best an approximation • Simulations confirm that there are efficiency gains if appropriate distribution is chosen • But incorrect assumptions can lead to very misleading conclusions • Sample mean performs well and is unlikely to lead to inappropriate inference • Only when there are sufficient data to permit detailed modelling is the choice of an alternative estimator warrented

Parametric modelling of cost data: some simulation evidence