190 likes | 260 Views
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance. David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California. Presentation Outline. Introduction to the problem Introduction to Bootstrap Resampling
E N D
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California
Presentation Outline • Introduction to the problem • Introduction to Bootstrap Resampling • Two resampling approaches for comparing two groups • Examples • Conclusions
Introduction to the Problem • Compare two groups from observational data • Outcome (Y) {e.g. Claim Cost} • Characteristics (X) have distributions F1 and F2 • Difficulties • F1¹ F2 • X is associated with Y (i.e. X is a confounder) • example: claim severity associated with claim cost
Introduction to the Problem • Ideal solution • Randomize subjects into the two groups • Ideal solution not usually possible • Alternate solution {Topic of the paper} • Identify characteristics where F1¹ F2 • Adjust the distribution of Y to account for the differing distributions of X
Introduction to Bootstrap Resampling • Purpose • Obtain the distribution of a parameter estimate (i.e. sampling distribution) • Not rely on assumptions about the underlying distribution • Often used when parameter estimate • has difficult to obtain distribution • relies heavily on unrealistic assumptions
Introduction to Bootstrap Resampling • Given Data • {X1, X2, …, Xn} where Xi is a p x 1 vector • X has unspecified distribution F • Parameter of interest Q • Q = T(F) is a parameter of interest • We want the distribution of
Introduction to Bootstrap Resampling • Distribution of • usually obtained through theoretical properties if repeated sampling is performed on a population with a known distribution of X • bootstrap techniques resample from the data to simulate repeated sampling from the population with unknown distribution of X
Introduction to Bootstrap ResamplingExample -- Population Mean • Example -- Population Mean • Resample with replacement from data • Data is (X1, …, Xn). • Each data point equally likely to be selected • Resampled data is (X(b)1, …, X(b)n). • is the bth bootstrap estimate of m
Introduction to Bootstrap ResamplingExample -- Population Mean • B bootstrap samples are drawn • Distribution of is estimated with the empirical distribution function of • Mean and variance of this distribution used to estimate mean and variance of
Two Resampling Methods for Comparing Two Groups • Method 1: Normalized comparisons • Y is a response of interest • X is a category variable, confounder • Z=1 for group 1, Z=2 for group 2 • F(Y|Z=1) normalized for distribution of X in group 2 • F(Y|Z=2) non- normalized
Two Resampling Methods for Comparing Two Groups • Method 1: Normalized comparisons • Resample from (Yi,Xi) seperately for groups 1 and 2 • Construct estimates of F(Y|X=xj) and P(X=xj) for two groups • Construct estimates of the normalized distribution functions on the previous slide • Parameter estimates can be obtained from this
Two Resampling Methods for Comparing Two Groups • Method 2: Bootstrapping linear regression • Y is a response of interest • X is vector of variables, confounders • Z=1 for group 1, Z=2 for group 2 • Use the regression model
Two Resampling Methods for Comparing Two Groups • Method 2: Bootstrapping linear regression • Estimate (a, g, b) with the least squares estimates on original data • Resample with replacement from the residuals • Construct the bth bootstrap value of Y as • bth bootstrap sample is
Two Resampling Methods for Comparing Two Groups • Method 2: Bootstrapping linear regression • Construct estimates of (a, g, b) with the least squares estimates on bootstrap sample • Using the B bootstrap estimates of (a, g, b), construct the distribution of the parameters of interest
Examples Using Data from a Nationwide Data Base of Workers Compensation Claims • Normalized comparisons of percentiles • Y= Total claim cost • Group 1: Providers in network A • Group 2: Providers not in network A • X is a 10 level variable representing claim severity derived through ICD9 code on a claim • B = 500 bootstrap sample drawn • median, 75th, and 95th percentiles compared • Normalization relative to group 1
Examples Using Data from a Nationwide Data Base of Workers Compensation Claims • Normalized comparisons of percentiles
ExamplesUsing Data from a Nationwide Data Base of Workers Compensation Claims • Bootstrapping linear regression • Y = log(Total Indemnity Costs) • X consists of several variables • NCCI body part designation, nature of injury designation, accident cause, industry class code, and injury type • 10 level claim severity measure derived with ICD9 code • Age and gender • Group 1: Specific provider of interest (Provider Z) • Group 2: All other providers • B=500 bootstrap samples
ExamplesUsing Data from a Nationwide Data Base of Workers Compensation Claims
Conclusions • Bootstrap methodology is a flexible robust method for deriving sampling distributions • Can be used to compare two groups while considering possible confounder variables • Useful method for observational studies • Only a few examples shown in this paper/presentation, much more potential