1 / 31

CANE 2007 Spring Meeting Visualizing Predictive Modeling Results

This presentation covers data validation, hypothesis building, model building, model testing, and monitoring in predictive modeling. It focuses on various visualization techniques as a diagnostic tool, including histogram, mosaic plot, missing data plot, time series plots, quantile-quantile plots, correlation web, and partial plots.

sabrinam
Download Presentation

CANE 2007 Spring Meeting Visualizing Predictive Modeling Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CANE 2007 Spring MeetingVisualizing Predictive Modeling Results Chuck Boucek (312) 879-3859

  2. Agenda • Data Validation • Hypothesis Building • Model Building • Model Testing • Monitoring • Visualization as a Diagnostic Tool

  3. Data Validation • Goals • Validate reasonableness of data • Understand key patterns in data • Understand changes in data and underlying business through time

  4. Data Validation • Histogram is a simple tool to for reasonability testing of modeling database

  5. Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions

  6. Data Validation • Missing Data plot shows the relationship of missing data elements

  7. Data Validation • Time series plots identify consistency of data over time 1.0 0.9 0.8 0.7 0.6 Claims Match to Exposure 0.5 0.4 0.3 0.2 0.1 Company 1 Company 2 0.0 Company 3 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

  8. Hypothesis Building • Goals • Perform initial analysis of potential predictor variables • Limit the list of predictor variables to be employed in subsequent phases of model building • Further reasonability testing of data

  9. Demographic Variable 1 1.500 0.750 15000 1.125 0.625 12500 0.500 0.750 10000 Frequency Loss Ratio Pure Premium 0.375 7500 0.375 0.250 5000 0.0 0 0 0.3 0.3 0 0.01 0.02 0.01 0.02 0.3 0.01 0.02 0.001 0.001 0.001 200 10000 40000 150 7500 32500 100 5000 25000 Severity Premium ($MM) Exposure ($MM) 50 2500 17500 0 0 10000 0 0.3 0.3 0 0.3 0 0.01 0.02 0.01 0.02 0.01 0.02 0.001 0.001 0.001

  10. Hypothesis Building • Quantile-Quantile plots help identify needed transformations of data

  11. Hypothesis Building • Correlation Web concisely summarizes a correlation matrix

  12. Model Building • Model building is an iterative process • Understanding patterns and relationships throughout this process is critical

  13. Model Building • Partial Plots are a key tool to visualize predictor variables throughout the model building process • What is a “Partial Plot?” Linear Predictor = k + b1X1 + b2X2 + b3X3 + b4X4 Predicted value = (ek) x (eb1X1) x (eb2X2) x (eb3X3) x (eb4X4) • Partial Plot demonstrates an individual predictor variables contribution to final prediction

  14. 1.25 8000 1.00 6000 0.75 4000 2000 0.50 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction

  15. 1.25 1.000 0.75 0.50 0.25 0 10 20 30 Model Building • Partial Plot with modified scatter plot of variable

  16. 0 5 10 15 20 25 1997 1998 1999 2000 0 5 10 15 20 25 Model Building • Time Consistency plot is a critical tool for numeric predictors

  17. 1.30 1.20 1.10 1.00 Credit Level 2 0.90 0.80 Credit Level 1 0.70 Model Building • Partial Plot for a factor variable

  18. Credit Variable 1 1.500 0.750 15000 1.125 0.625 12500 0.500 0.750 10000 Frequency Loss Ratio Pure Premium 0.375 7500 0.375 0.250 0.0 5000 No No No Yes Yes Yes 200 10000 40000 150 7500 32500 Exposure (Pred. Count) 100 5000 25000 Severity Premium ($MM) 50 2500 17500 0 0 10000 No No Yes Yes No Yes

  19. Model Testing • Likely the most critical visualizations in predictive modeling work • Management’s perception of a project’s success will likely depend on these visualizations • Holdout tests • Cross validation tests

  20. Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart - Holdout Sample 1.0 1.0 Predicted Actual 0.9 0.7 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4

  21. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Model Testing • ROC Curve shows overall model performance Holdout Sample ROC Curve Null, 0 Perfect, 1 prem, 0.51 pred.loss, 0.56

  22. Out of Sample Error Number of variables in final model Prediction Error In Sample Error 5 10 15 20 25 Number of Predictors Model Testing • Classical Cross Validation exhibit

  23. Monitoring Model Results • The work does not end when the lift chart looks good • Monitoring tools • Decile management • Exception analysis • Model vs. Actual Results

  24. Monitoring Model Results • Decile Management • Retention • Loss Ratio • Rate Action • Tier/Schedule Mod

  25. Monitoring Model Results • Average score over time

  26. Monitoring Model Results • Loss ratio of model exceptions

  27. Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model is underperforming in predicting loss ratio • Likely cause of underperformance is severity model

  28. Visualization as Diagnostic Tool

  29. Visualization as Diagnostic Tool

  30. Visualization as Diagnostic Tool

  31. Visualization as Diagnostic Tool • Two different visualizations of the same model tell a very different story!

More Related