390 likes | 600 Views
Predictive Modeling CAS Reinsurance Seminar May 7, 2007. Louise Francis, FCAS, MAAA Louise.francis@data-mines.com Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com. Why Predictive Modeling?. Better use of data than traditional methods
E N D
Predictive Modeling CAS Reinsurance SeminarMay 7, 2007 Louise Francis, FCAS, MAAA Louise.francis@data-mines.com Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com
Why Predictive Modeling? • Better use of data than traditional methods • Advanced methods for dealing with messy data now available Francis Analytics www.data-mines.com
Data Mining Goes Prime Time Francis Analytics www.data-mines.com
Becoming A Popular Tool In All Industries Francis Analytics www.data-mines.com
Real Life Insurance Application – The “Boris Gang” Francis Analytics www.data-mines.com
Predictive Modeling Family Francis Analytics www.data-mines.com
Data Quality: A Data Mining Problem • Actuary reviewing a database Francis Analytics www.data-mines.com
A Problem: Nonlinear Functions An Insurance Nonlinear Function:Provider Bill vs. Probability of Independent Medical Exam Francis Analytics www.data-mines.com
Classical Statistics: Regression • Estimation of parameters: Fit line that minimizes deviation between actual and fitted values Francis Analytics www.data-mines.com
Generalized Linear ModelsCommon Links for GLMs The identity link: h(Y) = Y The log link: h(Y) = ln(Y) The inverse link: h(Y) = The logit link: h(Y) = The probit link: h(Y) = Francis Analytics www.data-mines.com
Supervised learning Most common situation A dependent variable Frequency Loss ratio Fraud/no fraud Some methods Regression CART Some neural networks Unsupervised learning No dependent variable Group like records together A group of claims with similar characteristics might be more likely to be fraudulent Ex: Territory assignment, Text Mining Some methods Association rules K-means clustering Kohonen neural networks Major Kinds of Data Mining Francis Analytics www.data-mines.com
Desirable Features of a Data Mining Method • Any nonlinear relationship can be approximated • A method that works when the form of the nonlinearity is unknown • The effect of interactions can be easily determined and incorporated into the model • The method generalizes well on out-of sample data Francis Analytics www.data-mines.com
The Fraud Surrogates used as Dependent Variables • Independent Medical Exam (IME) requested • Special Investigation Unit (SIU) referral • (IME successful) • (SIU successful) • Data: Detailed Auto Injury Claim Database for Massachusetts • Accident Years (1995-1997) Francis Analytics www.data-mines.com
Predictor Variables • Claim file variables • Provider bill, Provider type • Injury • Derived from claim file variables • Attorneys per zip code • Docs per zip code • Using external data • Average household income • Households per zip Francis Analytics www.data-mines.com
Different Kinds of Decision Trees • Single Trees (CART, CHAID) • Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) • A composite or weighted average of many trees (perhaps 100 or more) Francis Analytics www.data-mines.com
Non Tree Methods • MARS – Multivariate Adaptive Regression Splines • Neural Networks • Naïve Bayes (Baseline) • Logistic Regression (Baseline) Francis Analytics www.data-mines.com
Classification and Regression Trees (CART) • Tree Splits are binary • If the variable is numeric, split is based on R2 or sum or mean squared error • For any variable, choose the two way split of data that reduces the mse the most • Do for all independent variables • Choose the variable that reduces the squared errors the most • When dependent is categorical, other goodness of fit measures (gini index, deviance) are used Francis Analytics www.data-mines.com
CART – Example of 1st split on Provider 2 Bill, With Paid as Dependent • For the entire database, total squared deviation of paid losses around the predicted value (i.e., the mean) is 4.95x1013. The SSE declines to 4.66x1013 after the data are partitioned using $5,021 as the cutpoint. • Any other partition of the provider bill produces a larger SSE than 4.66x1013. For instance, if a cutpoint of $10,000 is selected, the SSE is 4.76*1013. Francis Analytics www.data-mines.com
Continue Splitting to get more homogenous groups at terminal nodes Francis Analytics www.data-mines.com
Ensemble Trees: Fit More Than One Tree • Fit a series of trees • Each tree added improves the fit of the model • Average or Sum the results of the fits • There are many methods to fit the trees and prevent overfitting • Boosting: Iminer Ensemble and Treenet • Bagging: Random Forest Francis Analytics www.data-mines.com
Treenet Prediction of IME Requested Francis Analytics www.data-mines.com
Neural Networks = Francis Analytics www.data-mines.com
Neural Networks • Also minimizes squared deviation between fitted and actual values • Can be viewed as a non-parametric, non-linear regression Francis Analytics www.data-mines.com
Hidden Layer of Neural Network(Input Transfer Function) Francis Analytics www.data-mines.com
The Activation Function (Transfer Function) • The sigmoid logistic function Francis Analytics www.data-mines.com
Neural Network: Provider 2 Bill vs. IME Requested Francis Analytics www.data-mines.com
MARS: Provider 2 Bill vs. IME Requested Francis Analytics www.data-mines.com
How MARS Fits Nonlinear Function • MARS fits a piecewise regression • BF1 = max(0, X – 1,401.00) • BF2 = max(0, 1,401.00 - X ) • BF3 = max(0, X - 70.00) • Y = 0.336 + .145626E-03 * BF1 - .199072E-03 * BF2 - .145947E-03 * BF3; BF1 is basis function • BF1, BF2, BF3 are basis functions • MARS uses statistical optimization to find best basis function(s) • Basis function similar to dummy variable in regression. Like a combination of a dummy indicator and a linear independent variable Francis Analytics www.data-mines.com
Baseline Method: Naive Bayes Classifier • Naive Bayes assumes feature (predictor variables) independence conditional on each category • Probability that an observation X will have a specific set of values for the independent variables is the product of the conditional probabilities of observing each of the values given target category cj,j=1 to m (m typically 2) Francis Analytics www.data-mines.com
Naïve Bayes Formula A constant Francis Analytics www.data-mines.com
Advantages/Disadvantages • Computationally efficient • Under many circumstances has performed well • Assumption of conditional independence often does not hold • Can’t be used for numeric variables Francis Analytics www.data-mines.com
Naïve Bayes Predicted IME vs. Provider 2 Bill Francis Analytics www.data-mines.com
True/False Positives and True/False Negatives (Type I and Type II Errors) The “Confusion” Matrix • Choose a “cut point” in the model score. • Claims > cut point, classify “yes”. Francis Analytics www.data-mines.com
ROC Curves and Area Under the ROC Curve • Want good performance both on sensitivity and specificity • Sensitivity and specificity depend on cut points chosen • Choose a series of different cut points, and compute sensitivity and specificity for each of them • Graph results • Plot sensitivity vs 1-specifity • Compute an overall measure of “lift”, or area under the curve Francis Analytics www.data-mines.com
TREENET ROC Curve – IME Explain AUROC AUROC = 0.701 Francis Analytics www.data-mines.com
Ranking of Methods/Software – IME Requested Francis Analytics www.data-mines.com
Some Software Packages That Can be Used • Excel • Access • Free Software • R • Web based software • S-Plus (similar to commercial version of R) • SPSS • CART/MARS • Data Mining suites – (SAS Enterprise Miner/SPSS Clementine) Francis Analytics www.data-mines.com
References • Derrig, R., Francis, L., “Distinguishing the Forest from the Trees: A Comparison of Tree Based Data Mining Methods”, CAS Winter Forum, March 2006, WWW.casact.org • Derrig, R., Francis, L., “A Comparison of Methods for Predicting Fraud ”,Risk Theory Seminar, April 2006 • Francis, L., “Taming Text: An Introduction to Text Mining”, CAS Winter Forum, March 2006, WWW.casact.org • Francis, L.A., Neural Networks Demystified, Casualty Actuarial Society Forum, Winter, pp. 254-319, 2001. • Francis, L.A., Martian Chronicles: Is MARS better than Neural Networks? Casualty Actuarial Society Forum, Winter, pp. 253-320, 2003b. • Dahr, V, Seven Methods for Transforming Corporate into Business Intelligence, Prentice Hall, 1997 • The web site WWW.data-mines.com has some tutorials and presentations Francis Analytics www.data-mines.com
Predictive Modeling CAS Reinsurance SeminarMay, 2006 Louise.francis@data-mines.com www.data-mines.com