1 / 27

Perspective

Explore the challenges and strategies in building robust custom models for predictive modeling in commercial contexts. Learn about data sources, modeling techniques, performance validation, and more.

Download Presentation

Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perspective • Valen • Commercial context • Custom models • Compute intensive, multivariate, non-linear • My Background • ML at Stanford and NASA • Predictive modeling at Fair Isaac • My take on the topic • Credit scores can be useful • Custom models can be much more useful • Building a robust custom model is hard

  2. Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation

  3. Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation

  4. 0 Data Sources Your Data Geopolitical Data Industry Data Policy Data Predictive Model Public Records Claims Data Weather data

  5. Data Work • Data Normalization/ETL • Data Validation/Cleaning • Client data is noisy (e.g. negative premiums) • Data Understanding • Data time course & cheating data • Historical data vs. Production Data • Data Preprocessing • On-leveling, trending, etc… • Time-series analysis & Derived variables

  6. Data Validation: It pays to be careful

  7. Renewal Policy ‘T+1’ Issue

  8. Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation

  9. Multivariate & Nonlinear Risk Variable C 0 Risk Variable B Risk Variable A

  10. Challenges • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects • 1 risk factor with 10 possible values = 10 permutations • 10 risk factors with 10 possible values = 10,000,000,000 permutations • 50 risk factors with 10 possible values = 100,000,000,000,000 ,000,000,000,000,000,000 ,000,000,000,000,000,000 permutations Grid Computing

  11. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects

  12. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable

  13. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable

  14. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable

  15. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable

  16. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Over-fit Under-fit Testing Error Training Model Complexity

  17. Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects • Time/process shifts • Retraining or validity checking

  18. Methodology BlindValidation

  19. Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation

  20. Blind Validation Policy 1 Decile 1 Score 0-10 Policy 2 Policy 1342 Policy 2 Policy 7 Policy …. Policy 3 Decile 2 Policy 4 Score 10-20 Policy 1 Policy 33462 Policy 5 Policy …. Policy 5 Predictive Model …… Policy … Decile 10 Score 90-100 Policy 3 Policy 8825 Policy 52000 Policy 4 Policy …. Model Prediction Previously Unseen Policy Terms

  21. Loss Ratio Inadequate Adequate Discountable Worker’s CompensationBlind Validation 140% 120% 100% Loss Ratio w/IBNR 80% 60% 40% 20% 0 1 2 3 4 5 6 7 8 9 10 <--Worst Risk ---------------------------------Best Risk--> Deciles

  22. Commercial Auto LimosBlind Validation with Confidence Intervals

  23. Commercial Auto TankersRisk Signature

  24. Commercial Credit on Commercial Auto

  25. THANK YOU Valen Technologies, Inc. 720.570.3333 www.valentech.com

  26. Business Intelligence Hierarchy High Predictive Modeling and Automated Decisions Dashboards and Scorecards Data Mining and Analysis Data Insight Required Data Warehouse and Reporting Low Low Business Value Derived High Provided by the Tower Group

More Related