250 likes | 428 Views
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates. David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California. Session CPP-53. Presentation Outline. Introduction to Neural Networks
E N D
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California Session CPP-53
Presentation Outline • Introduction to Neural Networks • Introduction to Survival Analysis • Neural Networks with Right Censoring • Simulated Example • Predicting Claim Duration
Introduction to Neural NetworksMotivation • Complex Classification • Character Recognition • Voice Recognition • Humans have no trouble with these concepts • We can read even distorted documents • We can recognize voices over poor telephone lines. • Attempt to model human brain
Introduction to Neural NetworksConnection to Brain Functionality • Brain • made up of millions of neurons sending signals to the body and each other • Neural Networks • collection of “neurons” which send “signals” to produce an output
Introduction to Neural NetworksCommon Representation P predictors (inputs) X1 X2 XP . . . 1 Hidden Layer with M Neurons M 1 2 . . . 1 output Y
Introduction to Neural NetworksArchitecture of the ith Neuron Represents a neuron in the brain XP X2 ... X1 S is a function on the interval (0,1) representing the strength of the output Activation Function O=bi0 + bi1X1 + … + bipXp s 1 0 s(O) O
Introduction to Neural NetworksConnection to Multiple Regressions • Similarities • Both describe relationships between variables • Both can create predictions • Differences • Function describing the relationships is more complex • Response variables are typically called outputs • Predictor variables are typically called inputs • Estimating the parameters is usually called training
Introduction to Neural NetworksFunctional Representation Y = f(X1, …, Xp) + error • Multiple Linear Regression • f() = linear combination of regressors • Forced to model only specified relationships • Neural Network • f() = nonlinear combination of regressors • Can deal with nonlinearities and interactions without special designation
Introduction to Neural NetworksFunctional Specification • For a neural network f() is written • Here g and s are transformation functions specified in advance
Introduction to Survival AnalysisWhat is Survival Analysis • Used to model time to event data (example: time until a claim ends) • Usually represented by (1) right skewed data (2) multiplicative error structure (3) right censoring • Common in cancer clinical trials, component failure analysis, and AIDS data analysis among other examples
Introduction to Survival AnalysisNotation • T1, ..., Tn - independent failure times with distribution F and density function f • C1, ..., Cn - independent censoring times with distribution G and density function g • Yi = min(Ti,Ci) - observed time • i = I(Yi = Ti) - Censoring indicator • Xi = (Xi1, ..., Xip) - vector of known covariates associated with the ith individual
Introduction to Survival AnalysisLikelihood Analysis (Parametric Models) • (Yi, di, Xi) i=1, …, n , independent observations • Likelihood written • fQ(Y,|X)=[fQ (Y|X)(1-G(Y|X))][g(Y|X)(1-FQ (Y|X))] • Here L2 does not depend on Q
Neural Networks with Right CensoringModel Specification • Neural Network Model • Here e has distribution function Fe and density fe • Q = {a0, …, ap, b1, …, bp} • The likelihood is
Neural Networks with Right CensoringFitting Neural Networks without Censoring • Q estimated by minimizing squared error • If e is normal minimizing squared error same as maximizing the likelihood.
Neural Networks with Right CensoringFitting Neural Networks without Censoring • Gradient decent algorithm for estimating Q • Algorithm updated at each observation • l is known as the learning rate • Qj:0=Q j-1:n • Known as back-propagation algorithm • To generalize to right censored data, replace C(Q) with the likelihood for censored neural networks.
Neural Networks with Right CensoringFitting Neural Networks with Censoring • Step 1 - Estimating Q • Fix sand pass through data once using • Step 2 - Estimating s • fix Q at end of pass through data • iterate until |sj-sj-1|<e using Newton-Raphson algorithm
Neural Networks with Right CensoringFitting Neural Networks with Censoring • With highly parameterized neural networks we risk over fitting
Neural Networks with Right CensoringFitting Neural Networks with Censoring • We need to design the fitting procedure to find a good fit to the data
Neural Networks with Right CensoringFitting Neural Networks with Censoring • The negative of the likelihood is calculated on both sets of data at the same time. 75% Training Data 25% Testing Data Parameter Estimates Negative Likelihood Training Cycles Training Cycles
Neural Networks with Right CensoringFitting Neural Networks with Censoring • Potential drawbacks to neural networks • Hard to tell the individual effects of each predictor variable on the response. • Can have poor extrapolation properties • Potential Gains from neural networks • Can reduce preliminary analysis in modeling • discovery of interactions and nonlinear relationships becomes automatic • Increases predictive power of models
Simulated Example • True Time Model : log(t) = x2 + 0.5e1 • Censoring Model: log(c) = 0.25 + x2 + 0.5e2 • x ~ U(-3,3) • e1,e2~ N(0,1) • Censored if c < t • ~ 35% censoring • 3 node neural network fit
Simulated Example • Scatter are true times versus x • Solid line represents NN fit to data
Predicting Claim Duration • Predictor Variables • NCCI Codes • Body Part Code • Injury Type • Nature of Injury • Industry Class Code • Demographic Information • Age • Gender • Weekly Wage • Zip Code • Response Variable • Time from report until the claim is closed
Predicting Claim Duration • Ratio of prediction to actual duration on log10 scale • Difficult to represent open claim results
Conclusions • Provides an intuitive method to address right censored data with a neural network • Allows for more flexible mean function • Can be used with many time to event data situations