470 likes | 733 Views
Data Mining Applications in P&C Insurance. CASE Spring Meeting April 12, 2005 Lijia Guo, PhD, ASA, MAAA University of Central Florida. Agenda. Introductions to data mining modeling Understanding the data mining process Data mining (DM) techniques Applications in P&C Insurance Case Study.
E N D
Data Mining Applications in P&C Insurance CASE Spring Meeting April 12, 2005 Lijia Guo, PhD, ASA, MAAA University of Central Florida
Agenda • Introductions to data mining modeling • Understanding the data mining process • Data mining (DM) techniques • Applications in P&C Insurance • Case Study Guo
Introduction – What is Data Mining? • Process of exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. • Uses a variety of data analysis tools to discover relationships that may be used to make valid predictions. • It is not a magic wand: • Must know your business • Understand your data • Understand the analytical methods Guo
Introduction - DM Modeling • An information discovery process. • Knowing your goals • Understanding your data • Choosing the right methods • Understanding the limitations • Validation and testing • Make crucial business decisions Guo
Define the Goal Understand the Economics Identify Data Sources Prepare Data Transform Data Apply DM Models IMPLEMENT Validate DM Models Introduction – DM Process Guo
Introduction – DM Goals • Identifying responsive potential customers • Identifying existing customers that more likely to terminate • Identifying high risk purchaser • Identifying the factors that cause large claims • Identifying interactions among risk factors Guo
DM Techniques • Decision Trees • Logistic regression • Neural Networks • Fuzzy Logics • Genetic Algorithms • Clustering • Associated discovery • Sequence Discovery • Bayesian analysis • Visualization Hybrid algorithms Guo
DM Techniques -- Decision Trees • What are decision trees • Classify observations based on the values of nominal, binary, or ordinal targets • Predict outcomes for interval targets • Predict the appropriate decision when you specify decision alternatives Guo
DM Techniques -- Decision Trees • Strengths and weaknesses • Insights into the decision-making process • Efficient and is thus suitable for large data sets • Relatively unstable • Difficult to detect linear or quadratic relationships Guo
DM Techniques -- Logistic regression • What is Logistic regression • How Logistic regression works • Odds ratios • Each dependent variable affects logit linearly Guo
DM Techniques - Logistic Regression • Strengths and weaknesses • Maximum Likelihood Curve Fitting • Multiple Logistic Regression Model • Interaction-effect modifier • Multinomial Logistic Regression Model Guo
network architecture with two hidden layers DM Techniques -- Neural Networks • What are Neural Networks • Input layer - a unit for each input variable • Output layer - the target • Hidden layer - hidden unit (neurons) y Guo
DM Techniques – Neural Networks • : output activation function. • : activation functions-nonlinear transformations. • : weights • : Bias Guo
DM Techniques –Neural Networks • How Neural Networks work • Processing elements • Training • Predicting • Activation Functions • logistic function • hyperbolic tangent Guo
DM Techniques -- Neural Networks • Strengths and weaknesses • Accurately prediction for complex problems • Black box predict engine • Overtraining • Training speed Guo
DM Techniques -- Hybrid Algorithms • Problems with standard algorithms • Advanced algorithms • Discovery-driven approaches • Mixture of algorithms Guo
DM Applications in P&C Insurance • Data Warehouse • Underwriting • Pricing/Rate Making • Claim Scoring • Risk Management • Policy Level Analysis • Variable Selection Guo
Primary Selection:WHO? UniquePatient List Transactions Transactions Surveys Surveys Demographics Demographics PharmacyClaims Secondary Selection: WHAT DATA? Rx Service Level Table Derived Variables/ Flags PhysicianClaims Operational Data Store Med Claims Surveys ... Tertiary Selection: WHAT DOES THE TRANSACTION DATA TELL US? Group by Patient HospitalClaims Summary: WHAT DO WE KNOW ABOUT THIS PATIENT? Service Level Variables Summary Level Variables Summary Level Table Data Warehousing Example Guo
DM in Insurance Underwriting • Improving profit margin. • Gaining competitive edge • Risk evaluation process. • Lots of variables • Lots of interactions • Easy to follow procedure. • Decision tree can be used Guo
DM in Insurance Underwriting - Auto Driver’s Claim Information Guo
DM in Pricing/Rate Making • Data: Auto Driver’s Claim Information • Decision trees analysis to identify risk factors that predict profits, claims and losses • Logistic regression applied to model • Claim frequency • Effect of each risk factor Guo
DM in Pricing/Rate Making Effect T-scores from the logistic regression Guo
DM in Pricing/Rate Making - Assessment • Assessment • Cross-model comparisons of the expected to actual profits/losses • Independent of all other factors (sample size,..) • Lift charts • % claim-occurrence value to a random baseline model • Performance quality demonstrated by the degree the lift chart curve pushes upward and to the left Guo
DM in Pricing/Rate Making- Lift Chart for Logistic Regression logistic Regression - Captured 30% of the drivers in the 10th percentile - Better predictive power from about the 20th to the 80th percentiles Guo
DM in Risk Management • Reinsurance • To structure more effectively by segmentation • Hedging • Target retention and building loyalty Guo
DM in Policy Level Analysis • Retention analysis • Profitability analysis • Policyholder’s behavior • DM methods used • Neural networks • Decision trees • Logistic regression Guo
Applications – Variable Selection • Problem -- Given {Y,X} where • Find F, such that • Find and F*, such that • Improving model accuracy and efficiency • Making crucial business decisions Guo
Case Study - Group Insurance • Identify ways to build upon the current manual rating structure utilizing exiting rating variables to develop a practical tool to guild underwriting in rates adjustments • Identify any new rating variables with significant predictive power • Currently gathered, but not utilized data • Transformations of existing variables • introduce new rating variables (e.g. external financial data) Guo
Case Study – Group Insurance • Profit margin over x year period • 128 input variables • Principle Components Analysis applied • 42 variables remains • How to improve business profit? Guo
Case Study - Goals • Developing a practical underwriting tool • Detecting deviations • Identifying key drivers • Improving model predictive power • Risk selection Guo
Function Approximation • is the initial guess • Stegewise approximation • Each stage added by reducing errors • Each stage is weak linear – a small tree. • Sequential adjustment Guo
Regression Tree Example Profit=6.5% +1.2% , if male young than 30 +0.8% , if AS > 421 -1.1% , otherwise -0.5% , otherwise Guo
Function Approximation • GIVEN • Y: Output and X: Inputs or Predictors • L(Y, F): Loss Function • ESTIMATE Guo
Classical Function Approximation • Solve from Guo
Nonparametric Function Approximation • Compute • Initial guess • Take a step in the steepest descent direction Guo
Gradient Boosting • Initial guess • FOR m = 1 TO M • Fit an L-node regression tree to the current residuals • For each given node, calculate node average residual • Update: • END Guo
Case Study Guo
Case Study Guo
Case Study- Single Stats and Variable Importance Input Additive Multiplicative Importance Variable 1 0.2679 0.2690 100.00 Variable 2 0.2779 0.3203 75.23 Variable 3 0.1456 0.1771 54.65 Variable 4 0.2263 0.2469 47.41 Variable 5 0.1059 0.1425 42.81 Variable 6 0.2741 0.2847 34.81 Variable 7 0.1289 0.1306 34.27 Variable 8 0.0797 0.0864 25.35 Variable 9 0.1129 0.1148 23.37 Guo
Case Study- Pair Stats and Variable Importance VariablesAdditive Multiplicative Variable 1 & Variable 20.3714 0.3847 Variable 2 & Variable30.3704 0.4066 Variable 2 & Variable 40.3686 0.4010 Variable 2 & Variable 7 0.3401 0.3856 Variable 3 & Variable 40.2795 0.3137 Variable 3 & Variable 6 0.2895 0.3082 Variable 4 & Variable 70.2417 0.2592 Variable 5 & Variable 6 0.2622 0.2766 Variable 6 & Variable 70.2904 0.3066 Guo
Predictive Modeling • Predicts deviations from expected profitability (used 9 variables) • Practical guide for underwriters to use for rates adjustments • New variables Identified to have strong predictive power • Improve business profit (20% Profit margin) Guo
Importance of Multiple Techniques • Robust model with high predictive accuracy • Practical constrains • Algorithm complexity • Ease of understanding of results Guo
Is Data Mining for you? • Defining the goals • Understanding your data • Using multiple techniques • Improving your decision making process • Gaining competitive edges! Thank you! Guo