240 likes | 415 Views
C OMMERCIAL R ISK. Watch Your TPA: A Practical Introduction to Actuarial Data Quality Management. Aleksey Popelyukhin, Ph.D. Vice President, Information Systems & Technology aleksey@commrisk.com. Epigraph. “Dear Cardmember, the 1997 Year End Summary
E N D
COMMERCIAL RISK Watch Your TPA:A Practical Introduction to Actuarial Data Quality Management Aleksey Popelyukhin, Ph.D. Vice President, Information Systems & Technology aleksey@commrisk.com
Epigraph “Dear Cardmember, the 1997 Year End Summary of your account regretfully contained an error: we discovered that one or more of your transactions were “double counted” – please, accept our sincerest apologies for the error and for any inconvenience it may caused you.” The largest US credit card issuer 0
Main Logic Data Issues are everywhere: need for Quality Testing External Solution: TPAs systems certification Internal Solution: Data Quality Shield New New Data Integrity tests Time-variant Business Rules 1 New Main source of T-VBR: Actuarial Methods Assumptions!
Data Problems surround us • Data Elements (un)availability • Counts (claim details) • Large Losses’ historical evaluations • Premiums, Recoveries, etc.. • (Ever changing) Industry Statistics • NCCI corrects posted WC LDFs every year • TPAs monthly summaries (Loss Runs) • Yet to see a single TPA without a problem 2
Typical Errors 1. Absence of required fields • needed for policy conditions checks • Location (when deductible differs by Location) • Report Date (when coverage is “claims-made”) • needed for actuarial analysis • Closed Date (for Berquist-Sherman adjustments) • needed for unique record identification • Coverage Type (if the same accident covered by WC and Employer's Liability) 3a
More Errors 2. Duplicates (“double counting”) • Examples • True duplicates (same Claim ID) • Duplicate files (same accident, multiple Claim IDs) • Missing key fields (Claim Suffix or Coverage Type) • Detection • SQL Aggregation query with HAVINGclause • Correction • Flagging duplicate file as VOIDED 3b
More Errors 3. Unidentified “Occurences” • SQL query with User-Defined Function 4. Recoveries (SIF, S&S) • Outer Join with pre-aggregated sub-queries 5. Redundant fields consistency • Examples • Closed claims with empty Closed Date • Incurred less than Paid + Outstanding • Correction • SQL UPDATE query 3c
More Errors 6. Dummy Records • Fake claims for unallocated outsourced expenses (these “claims” distort XS amounts calculations) • Unidentified subtotals 7. Y2K and other date-related issues • 8 out of 43 TPAs still not Y2K complaint • NULL implementations • 01/01/01 • 0 or other “magic” number or date • 1/0/1900 • 11/01/1901 3d
More Errors 8. Disappearing claims • SQL query with NOT IN sub-query 9. Non-monotonic gross losses • Self Join SQL query 10. Consistent field definitions • Statutory Page 14 Data or ISO statistical plan • Reasonable expectations • Recoveries are negative • Accident Date <= Report Date <= Closed Date (if any) <= Evaluation Date 3e
More and More and More Errors... 11. Online access & digital exchange • Incomplete downloads • Improper filtering 12. Human errors on data entry • Manual data entry can easily defeat any (even the most sophisticated) validation system 13. Error propagation • Errors propagate to the future • Corrections should propagate to the past 3f
Data Quality defined • Quality data has to satisfy the following characteristics: • Accuracy: (violated in 6, 8, 9, 10, 12) the measure of the degree of agreement between a data value and a source assumed to be correct. • Completeness: (1, 2, 3, 7, 8, 11) the degree to which values are present in the attributes that require them. • Consistency: (5, 13) the requirement that data be free from variation or contradiction and satisfy a set of constraints. • Timeliness: (4) the extent to which a data item or multiple items are provided at the time required or specified (a degree to which specified values are up to date) • Uniqueness: (1,2) the need for precise identification of a data record. • Validity: (1, 2, 3, 6, 7, 9) the property of maintained data to satisfy the acceptance requirements of classification criteria and the ability of the data values to pass tests for acceptability, producing desired results. 4
Addressing Data Quality issues • External Solution • Data Sources (TPAs) certification 1. Data collection and entry (validation) 2. Data storage (database structure) 3. Data manipulations (triggers) 4. Data exchange (report generators) • Internal Solution • Data Quality Shield 5
Data Quality Shield-definition • Data Quality Shield is • an integrated set of standardized routines • optimized for every external data source and • comprised from pre-load data filters and translators, • along with post-load data analysis tools, statistical diagnostics and quality alarms. • This type of integration addresses 2 specific distinctions of the actuarial data: • multipleexternalsources of data (TPA’s) and • the time-variant nature of intended applications (actuarial methods). 6a
Data Quality Shield-purpose • Establish standards, (discovering and enforcing business rules, including time-variant business rules) • Validate Input(checking that data values satisfy data definitions) • Eliminate redundant data • Resolve data conflicts(determining which piece of redundant, but not matching data is the correct one) • Propagate correctionsand adjustments to prior evaluation for the time-variant data 6b
Data Quality Shield-possible interface 6c © 1998, Aleksey Popelyukhin. Screen design can not be used in commercial packages without written permission.
Data Quality Shield-impossible without actuaries • Actuaries are the last line of defense • even with FDA certification of Food Quality, one should not give up his immune system • Actuaries are well positioned for discovery of • Insurance Data Business Rules • Actuaries are best positioned for discovery of • Time-Variant Business Rules 6d
Time-Variant Business Rules • Typical Data Errors found in TPA’s Loss Runs can be sharply divided into two major categories: • Violations of static business rules (those which need single Loss Run present to be identified and fixed) and • Violations of time-variant business rules (those which track changes in time and need multiple Loss Runs for identification). 7
Actuarial assumptions testing-necessary part of the Actuarial Process • Formulae don’t work if assumptions are violated • 1 + x + x2 + x3 + .. = x/(1-x) • with x=2 • produces nonsense: 1 + 2 + 4 + 8 + .. = -1 • because x=2 is outside the domain: |x| < 1 • Chain-Ladder algorithm, for example, produces similar nonsense if significant diagonal effect is present or columns of factors correlate 8a
Actuarial assumptions testing-main source of Time-Variant Business Rules • Assumptions testing fails if • Real losses emergence differs from the hypothetical one • Analyzed Data contain significant Data Errors • non-monotonic number of Claims or amount of Losses • unexpected seasonality effects • outliers 8b
Actuarial assumptions testing-outliers • Every regression and hypothetical distribution may generate outliers • Outliers are indicators of potential Data problems • Ideally, every outlier has to be investigated • Current technology allows easily • identify outliers (Excel, statistical packages) • perform detailed analysis (OLAP with drill-down) • Outliers elimination is iterative process 8c
Conclusion • Actuarial Data Quality Testing is an integral part of the overall Actuarial Process • Actuaries are best positioned for the discovery of business rules, both static and time-variant • Actuarial Assumptions Testing breakthroughs will make Data Quality Testing as sophisticated as Actuarial Analysis itself • Technology breakthroughs will allow actuaries to perform Data Quality Testing themselves without delegating it to other professionals 9
Epilogue “In going over 1998 year-end summary, you will notice that the category ‘detail’, at a ‘glance’and in ‘detail’ sections refer to the year 1999in error. This summary actually reflects all the activity in Your account for the calendar year 1998. We apologize for any confusion this may cause.” The largest US credit card issuer 0
Recommended Reading • Thomas C. Redman, Data Quality for the Information Age (Artech, 1995) • Insurance Data Quality. Marr, Richard, ed. (IDMA, 1995) • Thomas Mack, Measuring the Variability of Chain Ladder Reserve Estimates. (CAS, 1993) • Gary G. Venter, Checking Assumptions of Age-to-Age Factors. (CLRS, 1994) • Ben Zehnwirth, Probabilistic Development Factor Models with Applications to Loss Reserve Variability, Prediction Intervals, and Risk Based Capital
The Whole Picture On Hierarchy of Actuarial Objects Data Processing from the Actuarial Point of View, 1998 The Big Picture Actuarial Process From the Data Management Point of View, 1996 Let Me See Visualization and Presentation of Actuarial Results, 1999 Watch Your TPA A Practical Introduction to Actuarial Data Quality Management, 1997