120 likes | 214 Views
The problem with costs. Tony O’Hagan CHEBS, University of Sheffield. A simple problem. Given a sample from a population, how can we estimate the mean of that population? Sample mean? Unbiased and consistent But sensitive to extreme observations In Health Economics
E N D
The problem with costs Tony O’Hagan CHEBS, University of Sheffield The 2003 CHEBS Seminar
A simple problem • Given a sample from a population, how can we estimate the mean of that population? • Sample mean? • Unbiased and consistent • But sensitive to extreme observations • In Health Economics • Costs are invariably very skewed • Can also arise with times to events and other kinds of data • And we really require inference about population means The 2003 CHEBS Seminar
A simple dataset • Costs incurred by 26 asthma patients in a trial comparing two inhalers • Patients who used pMDI and had no exacerbations The 2003 CHEBS Seminar
Some estimates & intervals • Sample mean (use CLT to justify normality) • Estimate 2104, 95% CI (-411,4619) • Bayesian analysis assuming normality (weak prior) • Posterior mean 2104, 95% credible interval (-411,4619) • Nonparametric bootstrap • Estimate 2104, 95% CI (298,4785) • Bayesian bootstrap • Posterior mean 2104, 95% credible interval (575,5049) • Bayesian analysis assuming lognormality • Posterior median 1112, 95% credible interval (510,3150) The 2003 CHEBS Seminar
Which is right? • Data appear to fit lognormal much better than normal • But many other distributions might visually fit well yet give completely different results • Results from analysis assuming normality are supported by bootstrap and by analysis based on CLT • These are well known to be robust methods • But bootstrapping the sample mean will always give the same estimate and will tend to back up the normal-theory analysis • And extreme skewness evident in the population suggests non-robustness of the sample mean • Real cost distributions won’t follow any standard form The 2003 CHEBS Seminar
The problem • The population mean depends critically on the shape of the tail • How can we learn about that tail from a small sample? • Or even quite a large one? The 2003 CHEBS Seminar
Bayesian model comparison • Bayes factors for the example data • Lognormal versus normal, 1028 • Lognormal versus square-root normal, 1012 • Lognormal is favoured over any other power transformation to normality • Lognormal versus gamma, 103 • This is far from conclusive • Distributions we can’t distinguish could still have completely different tails The 2003 CHEBS Seminar
Possible distributions • Normal – unrealistic, very thin tailed • Gamma – thin tailed (exponential) • Sample mean is MVUE • Lognormal – heavier tailed • Population mean exists but its posterior mean may not • Inverse gamma – heavy tailed (polynomial) • Population mean exists if enough degrees of freedom The 2003 CHEBS Seminar
Log-gamma, log-logistic – too heavy tailed? • Population mean never exists • Generalised Pareto – range of tail weights • Used in extreme value theory • Mixtures and chimeras • More flexible and realistic • Harder to fit • Bayesian methods essential The 2003 CHEBS Seminar
More complex structures • We nearly always wish to compare means • Extreme data can heavily influence comparison • Asthma dataset • We also often need to model costs in more complex ways • Components of costs • Covariates • Tail shape can again be very influential The 2003 CHEBS Seminar
Recommendations • Try a variety of models • If sample size is large enough, answers may be robust to modelling assumptions • Use prior information • We need evidence of what kinds of distributions can arise in different situations • And of how different they can be between different groups The 2003 CHEBS Seminar