230 likes | 244 Views
FROM CLASS TO INDIVIDUAL RATING CAS Predictive Modeling Seminar October 4 th , 5 th 2006. Data Challenges and Considerations in Building a Modeling Dataset. Main Topics. Data Sources Dealing with Product Changes How Detailed are your losses? Target Variable Considerations.
E N D
FROM CLASS TO INDIVIDUAL RATING CAS Predictive Modeling SeminarOctober 4th, 5th 2006 Data Challenges and Considerations in Building a Modeling Dataset @ Hanover Insurance Group: Catherine Eska
Main Topics • Data Sources • Dealing with Product Changes • How Detailed are your losses? • Target Variable Considerations @ Hanover Insurance Group: Catherine Eska
Internal Data Sources:Considerations • How much history do you need? • What is the most Complete and Accurate Data Source? • Where will the model obtain data once you implement it? • Need to capture and store model results post implementation. @ Hanover Insurance Group: Catherine Eska
Internal Data Sources:Policy / Premium • Policy Processing System • Generally only has Written Premium • Most Complete data source • Where real-time scoring would happen • Available at the time the policy is quoted / issued • Statistical Record • Premium and Loss Data • Only updated at certain points in time (month-end) • Some codes converted from what is entered • Some data elements may be dropped • May include Manual Policy Data @ Hanover Insurance Group: Catherine Eska
Internal Data Sources:Claims / Losses • Statistical Record • Premium and Loss have same coding structure (Major / Minor Line, Major Peril) • Assignment of Losses to Building / Location / Vehicle may be suspect • Claim System • Codes most likely follow Policy Processing System • History may not be readily available • More accurate assignment of losses • Additional Data Elements may be available @ Hanover Insurance Group: Catherine Eska
Internal Data Sources:Balance and Verify • Most likely will get data from multiple sources • Make sure you balance the data: Premium $, Loss $, Counts (Policy and Claim) • Statistical Records usually most complete and accurate, so balance any data from other sources to this. @ Hanover Insurance Group: Catherine Eska
Internal Data Sources:Leave No Stone Unturned • Billing • May be a separate system • Some Billing attributes may be captured in the Policy Processing System • Agency Data • Name and Address • Year Agent Appointed • Agent Status • Fraud / SIU • Litigation @ Hanover Insurance Group: Catherine Eska
External Data Sources • Vendors (Experian, ChoicePoint, D&B …) • NOAA – Weather Data • Census – Demographic Data • WCRI / HLDI • State Rating Bureaus • NCCI / ISO • Considerations: Cost / Appropriateness / Regulatory @ Hanover Insurance Group: Catherine Eska
Product Changes:Coverage Examples • Rental Reimbursement • Medical Payments Limit • Embedded Limits: • Jewelry, Watches and Furs • Accounts Receivable • Employee Dishonesty • Building and Ordinance • Broadening Endorsements @ Hanover Insurance Group: Catherine Eska
Product Changes:Coverage Treatment • Create an indicator to determine if the limit of coverage was selected by the insured or embedded in the base policy • Create a variable to represent what version of a broadening endorsement was on the policy @ Hanover Insurance Group: Catherine Eska
Product Changes:Other • Codes change – data dictionary – statistical manual • Definition changes – Age of Building versus Year Built • Indivisible Premium split to separate Coverage Premiums: Summarize at policy level (lowest common denominator) @ Hanover Insurance Group: Catherine Eska
How Detailed are your Losses? • It is generally desirable to build the model at the lowest level of detail that is accurate. • Personal Automobile – by Vehicle • Business Owner – by Building / Location • Workers Compensation – by State / Class • Heavily dependent on the quality of the individual company’s data. • When the detail is missing, you can get creative in your variable definitions @ Hanover Insurance Group: Catherine Eska
Losses should be accurate at the Policy Level • You can not tie losses accurately to location / building / vehicle • Create Pseudo Variables: • Highest Building Value • Lowest Building Value • Number of Buildings • Deductible associated with Highest Building Value @ Hanover Insurance Group: Catherine Eska
Alternatives to Policy Level: • State / Class Code Level • Accuracy of loss assignment to state is typically very good • Accuracy of loss assignment to class should be explored with Claims • Very common for Commercial Policies to have multiple states and classes on them @ Hanover Insurance Group: Catherine Eska
Coverage Treatment in Modeling • Pure Premium models are generally built by coverage. • Loss Ratio models are restricted to the level at which premiums are calculated • Can create variables that are specific to coverage within the modeling dataset: Liability Limit, General Liability Class Code, Industry (SIC or NAICS code), Dogs (Y/N), Toys (Jet Ski) etc … @ Hanover Insurance Group: Catherine Eska
Target Variable Considerations:PRODUCT CHANGES • Adjust premiums to the new rate structure • Re-Rate historical policies using current rates • On-Level Factors • If you adjust the historical premiums, you must also adjust historical losses • Trend Losses in the History to reflect broader coverage Levels as well as inflation • Cap Losses in the Past to reflect more restrictive coverage levels (caps on replacement cost) @ Hanover Insurance Group: Catherine Eska
When do you Adjust for Trend? Loss Ratio Approach Using Premium at the current rate level Select appropriate trend for losses by coverage When don’t you Adjust for Trend? Frequency/Severity Approach Include Policy Year as a Dependent Variable Is the trend implied by model reasonable? Target Variable Considerations: TREND @ Hanover Insurance Group: Catherine Eska
Target Variable Considerations:Loss Development Claim data should be of sufficient maturity: • Can you Limit dataset to Closed Claims? • Chose an Age where “Pure” IBNR Claims are no longer expected • Desire that future development on known claims is Minimal • Balance between responsiveness and stability @ Hanover Insurance Group: Catherine Eska
Target Variable Considerations:Loss Development Build a GLM to model Loss Development • Consider using Duration of the Claim as a Dependent Variable • Extrapolate duration of open claims using a survival model • Investigate other data elements available in the Claims System … Claimant Age, Gender, Litigation Status @ Hanover Insurance Group: Catherine Eska
Target Variable Considerations: Loss Development Options Broad Loss Development Assumptions • Not great at predicting actual ultimate loss at a policy level • Try to build assumptions for homogeneous groups of claims: Coverage, Program, Industry, State … • If claims emergence is not complete: Earned Premium * Expected Loss Ratio * % Unreported • If claims are fully reported: Loss Development Factor * Reported Loss @ Hanover Insurance Group: Catherine Eska
Target Variable Considerations: Loss Development Implicit approach – No adjustment • Include Open/Closed Indicator in Severity Model • Include Policy Year in Model and Observe Implied Policy Year Trend • Project trend coefficient for newer years in setting up final model for implementation @ Hanover Insurance Group: Catherine Eska
Target Variable Considerations: TREND @ Hanover Insurance Group: Catherine Eska
Contact Information Catherine E. Eska, FCAS, MAAA Vice President Underwriting Analytics – Corporate Actuarial The Hanover Insurance Group 440 Lincoln Street, S457 Worcester, MA 01653 ceska@Hanover.com (508) 855-2493 @ Hanover Insurance Group: Catherine Eska