310 likes | 333 Views
This presentation covers data validation, hypothesis building, model building, model testing, and monitoring in predictive modeling. It focuses on various visualization techniques as a diagnostic tool, including histogram, mosaic plot, missing data plot, time series plots, quantile-quantile plots, correlation web, and partial plots.
E N D
CANE 2007 Spring MeetingVisualizing Predictive Modeling Results Chuck Boucek (312) 879-3859
Agenda • Data Validation • Hypothesis Building • Model Building • Model Testing • Monitoring • Visualization as a Diagnostic Tool
Data Validation • Goals • Validate reasonableness of data • Understand key patterns in data • Understand changes in data and underlying business through time
Data Validation • Histogram is a simple tool to for reasonability testing of modeling database
Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions
Data Validation • Missing Data plot shows the relationship of missing data elements
Data Validation • Time series plots identify consistency of data over time 1.0 0.9 0.8 0.7 0.6 Claims Match to Exposure 0.5 0.4 0.3 0.2 0.1 Company 1 Company 2 0.0 Company 3 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Hypothesis Building • Goals • Perform initial analysis of potential predictor variables • Limit the list of predictor variables to be employed in subsequent phases of model building • Further reasonability testing of data
Demographic Variable 1 1.500 0.750 15000 1.125 0.625 12500 0.500 0.750 10000 Frequency Loss Ratio Pure Premium 0.375 7500 0.375 0.250 5000 0.0 0 0 0.3 0.3 0 0.01 0.02 0.01 0.02 0.3 0.01 0.02 0.001 0.001 0.001 200 10000 40000 150 7500 32500 100 5000 25000 Severity Premium ($MM) Exposure ($MM) 50 2500 17500 0 0 10000 0 0.3 0.3 0 0.3 0 0.01 0.02 0.01 0.02 0.01 0.02 0.001 0.001 0.001
Hypothesis Building • Quantile-Quantile plots help identify needed transformations of data
Hypothesis Building • Correlation Web concisely summarizes a correlation matrix
Model Building • Model building is an iterative process • Understanding patterns and relationships throughout this process is critical
Model Building • Partial Plots are a key tool to visualize predictor variables throughout the model building process • What is a “Partial Plot?” Linear Predictor = k + b1X1 + b2X2 + b3X3 + b4X4 Predicted value = (ek) x (eb1X1) x (eb2X2) x (eb3X3) x (eb4X4) • Partial Plot demonstrates an individual predictor variables contribution to final prediction
1.25 8000 1.00 6000 0.75 4000 2000 0.50 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction
1.25 1.000 0.75 0.50 0.25 0 10 20 30 Model Building • Partial Plot with modified scatter plot of variable
0 5 10 15 20 25 1997 1998 1999 2000 0 5 10 15 20 25 Model Building • Time Consistency plot is a critical tool for numeric predictors
1.30 1.20 1.10 1.00 Credit Level 2 0.90 0.80 Credit Level 1 0.70 Model Building • Partial Plot for a factor variable
Credit Variable 1 1.500 0.750 15000 1.125 0.625 12500 0.500 0.750 10000 Frequency Loss Ratio Pure Premium 0.375 7500 0.375 0.250 0.0 5000 No No No Yes Yes Yes 200 10000 40000 150 7500 32500 Exposure (Pred. Count) 100 5000 25000 Severity Premium ($MM) 50 2500 17500 0 0 10000 No No Yes Yes No Yes
Model Testing • Likely the most critical visualizations in predictive modeling work • Management’s perception of a project’s success will likely depend on these visualizations • Holdout tests • Cross validation tests
Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart - Holdout Sample 1.0 1.0 Predicted Actual 0.9 0.7 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4
1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Model Testing • ROC Curve shows overall model performance Holdout Sample ROC Curve Null, 0 Perfect, 1 prem, 0.51 pred.loss, 0.56
Out of Sample Error Number of variables in final model Prediction Error In Sample Error 5 10 15 20 25 Number of Predictors Model Testing • Classical Cross Validation exhibit
Monitoring Model Results • The work does not end when the lift chart looks good • Monitoring tools • Decile management • Exception analysis • Model vs. Actual Results
Monitoring Model Results • Decile Management • Retention • Loss Ratio • Rate Action • Tier/Schedule Mod
Monitoring Model Results • Average score over time
Monitoring Model Results • Loss ratio of model exceptions
Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model is underperforming in predicting loss ratio • Likely cause of underperformance is severity model
Visualization as Diagnostic Tool • Two different visualizations of the same model tell a very different story!