270 likes | 285 Views
Explore the challenges and strategies in building robust custom models for predictive modeling in commercial contexts. Learn about data sources, modeling techniques, performance validation, and more.
E N D
Perspective • Valen • Commercial context • Custom models • Compute intensive, multivariate, non-linear • My Background • ML at Stanford and NASA • Predictive modeling at Fair Isaac • My take on the topic • Credit scores can be useful • Custom models can be much more useful • Building a robust custom model is hard
Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation
Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation
0 Data Sources Your Data Geopolitical Data Industry Data Policy Data Predictive Model Public Records Claims Data Weather data
Data Work • Data Normalization/ETL • Data Validation/Cleaning • Client data is noisy (e.g. negative premiums) • Data Understanding • Data time course & cheating data • Historical data vs. Production Data • Data Preprocessing • On-leveling, trending, etc… • Time-series analysis & Derived variables
Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation
Multivariate & Nonlinear Risk Variable C 0 Risk Variable B Risk Variable A
Challenges • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects • 1 risk factor with 10 possible values = 10 permutations • 10 risk factors with 10 possible values = 10,000,000,000 permutations • 50 risk factors with 10 possible values = 100,000,000,000,000 ,000,000,000,000,000,000 ,000,000,000,000,000,000 permutations Grid Computing
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Losses A Risk Variable
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects Over-fit Under-fit Testing Error Training Model Complexity
Challenges of Predictive Modeling • Curse of dimensionality • Local optima • Signal v. noise • Over-fitting & Under-fitting • Representative sample distributions • Non-linearity • Non-stationarity • Credibility • Sparsity • Explain ability • Regulatory • Shock Losses • Interaction Effects • Time/process shifts • Retraining or validity checking
Methodology BlindValidation
Overview • Data: • Dataset building/validation • Data exploration/selection • Modeling: • Modeling strategy • Model optimization • Performance Validation
Blind Validation Policy 1 Decile 1 Score 0-10 Policy 2 Policy 1342 Policy 2 Policy 7 Policy …. Policy 3 Decile 2 Policy 4 Score 10-20 Policy 1 Policy 33462 Policy 5 Policy …. Policy 5 Predictive Model …… Policy … Decile 10 Score 90-100 Policy 3 Policy 8825 Policy 52000 Policy 4 Policy …. Model Prediction Previously Unseen Policy Terms
Loss Ratio Inadequate Adequate Discountable Worker’s CompensationBlind Validation 140% 120% 100% Loss Ratio w/IBNR 80% 60% 40% 20% 0 1 2 3 4 5 6 7 8 9 10 <--Worst Risk ---------------------------------Best Risk--> Deciles
Commercial Auto LimosBlind Validation with Confidence Intervals
THANK YOU Valen Technologies, Inc. 720.570.3333 www.valentech.com
Business Intelligence Hierarchy High Predictive Modeling and Automated Decisions Dashboards and Scorecards Data Mining and Analysis Data Insight Required Data Warehouse and Reporting Low Low Business Value Derived High Provided by the Tower Group