410 likes | 611 Views
Exposure Data Quality and Catastrophe Modeling. Rick Anderson February 28, 2002. Data Quality Issues. Is the insurance to value being accurately reflected? Does my data capture my actual exposure on a regional and peril basis? Do I understand the default assumptions in my data?
E N D
Exposure Data Quality and Catastrophe Modeling Rick Anderson February 28, 2002
Data Quality Issues • Is the insurance to value being accurately reflected? • Does my data capture my actual exposure on a regional and peril basis? • Do I understand the default assumptions in my data? • Do I know that the information my brokers and agents are providing me is correct? • Am I capturing my aggregate information correctly?
Statement of the Problem • What is the impact of poor data quality on: • Exposure data values • Uncertainty in modeled losses • Business decisions (external and internal) • How do I quantify / score data quality • On a location basis • On a policy basis • On an aggregate portfolio basis • How do I optimize data quality given my current business constraints? • What improvements should I be making?
Tackling the Problem • Close working relationship with business partners • Agents • Reinsurers • Modeler • Development of a structured data quality assessment • Ability to identify specific data quality issues and their impact on portfolio risk assessment at all levels. • Development of a consistent independent data quality measure • Data Quality Index (DQI)
Data Quality in the Context of Data Flow Primary Insurance Company Perspective Pricing, Reinsurance, Cap. Allocation, etc. Data Acquisition (Source Data) Production Stream Exposure Database Cat Model Analysis DATA FLOW Data Acquisition Accuracy Analysis Process/ Operational Accuracy Analysis Data Resolution Analysis What does it mean? What matters? DATA QUALITY
Components of Data Quality • Accuracy component • Resolution component
Examining the Components of Exposure Data Quality: Data Accuracy • How accurately is my data being captured and processed? • Examination of processes through interviews and exposure data queries • Data acquisition • Data processing • Operations / systems • Market dependent • Logic tree assessment framework
Data Accuracy Components • Data acquisition (source of data origination) • Conditional on type of source and line of business • Source reputation / bias • Source data vintage / validity / consistency / interpretability • Data processing • Conditional on line of business • Bias / vintage / validity / consistency / interpretability • Operations / systems • Data accessibility / data integration / systems process / operations value to cost
Accuracy Component of Data Quality Assessment Framework Components of Data Flow Peril and LOB specific On-Site Questions 1. Data Acquisition Questionnaire 1 W1 Warning flags from queries of the exposure database Accuracy Component Data Quality W2 Questionnaire 2 2. Data Processing W3 3. Operations Questionnaire 3
Example Logic Tree with Data Accuracy Criteria Direct Independent Agent Wholesale Broker Retail Broker Risk Retention Group Question 1 Question2 . . . 0.5 Reputation Reputation/Bias Bias 0.5 0.3 Data Acquisition Accuracy Score Question 11 Question 12 . . . 0.3 Vintage 0.7 Data 0.4 Validity Integrated Data Submission Catastrophe Model EDM Digital (Spreadsheet, Word Doc, etc.) Paper Submission 0.2 Consistency Question 29 Question 30 . . . 0.1 Interpretability
Development of Data Accuracy Criteria Relative Importance Weights • Assessed as relative impact on modeled losses and key data quality issues • Based on: • Extensive interviews with Cat managers, underwriters and systems personnel • Results of relative parameter impact analyses on AAL (data validity criteria) • Availability of other information from which to draw assumptions • Line of business, peril, and region dependent
Data Accuracy Criteria Development of Questionnaire • Questionnaire is administered through interview process • Questions are multiple choice • Yes / No • Always / Most of the Time / Occasionally / Never • Number and content of questions designed to adequately assess how criteria are addressed at company • Normalized relative importance weighting applied to questions within each criteria
Warning Flags Summaries from DB Queries • Used as supporting information in answering questionnaire • Warning flags • Data consistency • Address entry • Values • Construction and occupancy class/schema • Data vintage • Data bias • Secondary characteristics • Primary characteristics
Warning Flags Summaries from DB Queries – Sample Results Data Vintage – Use of Policy Status Flag Status# of Policies% of Total Policies BOOK 1 16.7% No Status 5 83.3% Data Consistency – Value Entry Check Address Total Total Average Match LocationsValueValue Street Level 5 $1,270,000 $254,020 Zip Level 1 $147,500 $147,500
1. Data Acquisition Accuracy • Conditional on type of provider of data, data format, submission process, and line of business • Data acquisition accuracy components • Validity • Vintage • Data provider bias • Data provider reputation • Consistency • Interpretability
Acquisition Criteria: Relative Importance • Data vintage • Location validity checks • Default value treatment • Data acquisition bias • Data validity checks • Use of data alteration flags • Data aggregation • Location entry consistency • Reputation of data provider • Secondary construction characteristics treatment High Low
2. Data Processing • Conditional on database format, platform, and line of business • Incorporates results from queries of exposure database • Data processing accuracy components • Bias • Validity • Interpretability • Vintage • Consistency
3. Systems / Operations Accuracy • Processing / operations data quality components • Data accessibility and storage • Data integration and linking • Technology systems process/flow • Operational value-to-cost
Data Accuracy - Summary • Assessment of how closely processes arrive at the true and accepted value • Structured and consistent approach • Ability to assess the contribution of individual components to overall data accuracy score • Periodic assessment is valuable for internal process review • Integral component to overall data quality
Examining the Components Of Exposure Data Quality: Data Resolution • What data am I capturing and at what level? • Direct query of exposure data parameters • Geocoding • Construction • Occupancy • Year built • Building height • Construction modifiers • Peril, region and market dependent
Data Resolution Analysis Tree - California Earthquake Residential Location Resolution Const. Scheme Construction Class Occupancy Class Year Built Number of Stories Secondary Characteristics Inventory Unknown RMS Unknown Unknown Unknown Zip Code SF House ATC MFW Frame Soft Story Known ISO Fire MF Housing Frame Bolted Down SFW Frame Known URM Chimney LOCATION Coordinate Cripple Walls Cladding County VULNERABILITY HAZARD
Development of Category Weights • Weights for individual categories are determined through numerical simulation (analysis) of the impact of a given category on losses for the geography, peril, and LOB under consideration • Final weights are normalized across the applicable categories CategoryHighMed.Low Geocoding w1a w1b w1c Cons. Scheme w2a w2b w2c Year Built w5a w5b w5c 2nd. Char. w6a w6b w6c • Extensive testing, validation, and benchmarkingperformed
Resolution Geocoding Scores by Hazard Level California Earthquake Residential
Data Resolution Category Score Summary – California Earthquake Residential High Hazard Region
Data Resolution Category Weights California Earthquake Residential Geocoding Low Construction Scheme Construction Class Occupancy Class Year Built Secondary Chars. Seismic Regions Med High 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Score (%)
Data Resolution Category Weights Florida Hurricane Commercial Geocoding Construction Low Scheme Construction Class Occupancy Class Med Number of Stories Hazard Regions Secondary Chars. High Very High 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Score (%)
Data Resolution - Aggregation MethodologyProgression of Data Resolution Scoring Score Portfolio Commercial Portfolio P1 Account 1 Account 2 Account 3 A1 A2 A3 L1 L2 Ln L1 L2 Ln L1 L2 Ln Location
Data Resolution - Development of Relative “Importance” Factors • Relative importance is an approximation of the AAL. • Ground-up AAL approximated at ZIP code level based on insurance industry exposure. • Gross AAL approximated by average ratio of gross / ground-up AAL per attachment point.
Company A Geocoding Construction Scheme Company B Construction Class Occupancy Class Number of Stories Secondary Chars. Company C - 10 20 30 40 50 60 70 80 90 100 Score (%) Sample Resolution Scores
Improving Data Resolution – Leveraging Account Data Resolution Scores • Identify accounts with score less than than target score • Determine account potential for improvement as:(Score Difference) * (Account Importance) • Identify accounts with biggest improvement potential and decide on strategy for data improvement
Improving Data Resolution – Targeting Accounts • Score Difference = Target Score - Account Score • Potential Improvement = (Score Difference) * (Importance)
Options for Combining Accuracy and Resolution Components • Keep separate • Additive • Multiplicative • Minimum