Health Insurance Conference 2012 Predictive Modelling “ GLMs and beyond GLMs” Singapore – May 2012 Xavier Conort

Health Insurance Conference 2012Predictive Modelling “GLMs and beyond GLMs”Singapore – May 2012Xavier Conort

AGENDA • GLMs: The Good, the Bad and the Ugly • Trees or how to detect interactions • GLM(ixed)M or how to handle variables with a large number of categories • Regularized GLMs or how to handle texts or data with a large number of predictors • The PRIDIT method or how to handle with no or little information on the response Gear Analytics

GLMs is a standard but be aware of its limitations Gear Analytics

Gear Analytics

How smart actuaries detect potential interactions • luck • intuition • descriptive analysis • experience • market practices • Machine Learning techniques based on decision trees Gear Analytics

Regression trees are known to detect interactions …but usually have lower predictive power than GLMs and are unstable. • By construction, regression trees partition the feature space into a set of rectangles and then produce a multitude of local interactions Gear Analytics

Random Forest will provide you with higher predictive power… … but less interpretability … A Random Forest is: a collection of weak and independent decision trees such that each tree has been trained on a bootstrapped dataset with a random selection of predictors (think about the wisdom of crowds) Gear Analytics

Boosted Regression Trees or learn step by step slowly BRTs (also called Gradient Boosting Machine) use boosting and decision trees techniques: The boosting algorithm gradually increases emphasis on poorly modelled observations. It minimizes a loss function (the deviance, as in GLMs) by adding, at each step, a new simple tree whose focus is only on the residuals The contributions of each tree are shrunk by setting a learning rate very small (and < 1) to give more stable fitted values for the final model To further improve predictive performance, the process uses random subsets of data to fit each new tree (bagging). Gear Analytics

Why do I love BRTs? BRTs can be fitted to a variety of response types(Gaussian, Poisson, Binomial) BRTs best fit (interactions included) is automatically detected by the machine BRTs learn non-linear functionswithout the need to specify them BRT outputs have some GLM flavour and provide insighton the relationship between the response and the predictors BRTs avoid doing much data cleaningbecause of their ability to accommodate missing values immunity to monotone transformations of predictors, extreme outliers and irrelevant predictors Gear Analytics

BRTs’ Partial dependence plots Non-linear relationship detected automatically represent the effect of each predictor after accounting for the effects of the other predictors Gear Analytics

Plot of interactions fitted by BRTs Gear Analytics

BRTs’ prediction formula Let’s consider 1 numerical predictor Xnum and 1 categorical predictor Xcat (with two levels) GLMs’ prediction formula will be Yhat=g-1(β0+βnum*Xnum+βcat*I(Xcat==1)) with g the link function BRTs’ prediction formula is more complex and less easily implementable Yhat=g-1(β0+βnum1*I(Xnum<α1)+βnum2*I(Xnum<α2)+… +βcat*I(Xcat==1) +βint1*I(Xnum<γ1 & Xcat==0)+βint2*I(γ2<Xnum<γ3& Xcat==1)+…) Gear Analytics

How smart actuaries handle with factors with a large nb of categories • In GLMs, predictors with many levels (e.g. territory, car models) and little statistical material aren’t credibility adjusted. • GLMs diagnostics will only alert you. Relativities of levels with little exposure are squarely in the middle of wide confidence intervals driven by large standard errors. • In practice, ad hoc credibility adjustments are applied by actuaries before deploying the model • Generalized Linear Mixed Models (GLMMs) can accomplish this credibility adjustment by modellingboth fixed and random effects and provide credibility estimates automatically. Gear Analytics

How smart actuaries handle with data with a large nb of variables • GLMs are sensitive to multicollinearity and provide you with estimates for every single predictors which lead to over-fitting and unrobust results • By fitting Regularized GLMs, you can automatically select most relevant predictors and accommodate multicollinearity by introducing a penalty in the loss function (the deviance) to minimize. Here, for a gaussian error: Gear Analytics

The penalty effect in a regularized GLM Gear Analytics

How to make use of texts • Usually, punctuations and numbers are removed and words are stemmed. But varies with the application. • Rare and very frequent words are discarded • A document-term matrix is produced: • Incidence or frequency matrix • The matrix is sometimes scaled to put more emphasis on rare but predictive words • Regularized GLMs are applied to the matrix (with sometimes 5000 columns!). • Alternative: Support Vector Machine Gear Analytics

How smart insurance companies handle with fraud • GLMs are sometimes presented as a potential technique in fraud detection but in practice, they fail because: • history of fraud cases are insufficient and incomplete • do not detect previously undetected fraud cases • In practice, companies use a series of red flags (based on categorical and numerical variables) but fail to have a single indicator • Numerous actuarial articles in the past years presented a unsupervised technique (no label to train) called PRIDIT as an efficient actuarial way to make use of those operational red flags Gear Analytics

PRIDIT technique basic ideas • Transform all numerical and ordinal red flags in a same scale (values between -1 and 1) using RIDIT statistics (based on cumulative distribution) • Apply Principal Component Analysis (PCA) to the RIDIT scores to produce a single indicator Gear Analytics

But what is PCA’s basic idea? Maximize the variance of the projected data on a few axis Gear Analytics

Example (1/4) • Suppose we want to combine all the information related to fraud We compute their ridit scores to put them all in the same scale (including numeric variables) From Fraud Classification Using PCA of Ridits – The Journal of Risk and Insurance, 2002, Vol. 69, No3, 341-371 Gear Analytics

Example (2/4) • We apply PCA to replace many variables with a score • We look for the factor that explains the most variance (captures most of the correlation) for the set of variables • That factor extracted will be a weighted average of the variables (a score) • That score can be used to sort claims • More effort can be spent on claims more likely to be fraudulent or abusive Gear Analytics

Example (3/4) • One can decide to investigate claim 3 first, claim 7 next, and pay the rest of the 10 claims (or if ressources allow, investigate in increasing PRIDIT score order untill resources are exhausted). Gear Analytics

Example (4/4) Factors loading are also sometimes used to explain the importance of variables Gear Analytics

Does it work? • Based on the US actuaries papers: Yes! • There appears to be a strong relationship between PRIDIT score and suspicion that claim is fraudulent or abusive • The Society of Actuary even funded a study which extends the use of the PRIDIT technique to the measurement of Hospital quality Gear Analytics

Health Insurance Conference 2012 Predictive Modelling “ GLMs and beyond GLMs” Singapore – May 2012 Xavier Conort