480 likes | 637 Views
New Applications of Statistics and Data Mining Techniques to Classification and Fraud Detection in Insurance. Richard A. Derrig Ph. D. OPAL Consulting LLC Visiting Scholar, Wharton School University of Pennsylvania. Bogotá, Columbia November 3, 2005. Insurance Fraud Bureau of Massachusetts.
E N D
New Applications of Statistics and Data Mining Techniques to Classification and Fraud Detection in Insurance. Richard A. Derrig Ph. D. OPAL Consulting LLCVisiting Scholar, Wharton SchoolUniversity of Pennsylvania Bogotá, Columbia November 3, 2005
Insurance Fraud Bureau of Massachusetts You can steal more with a ball point than by gun point.
Insurance Fraud Bureau of Massachusetts Insurance fraud is known as a high reward, low risk crime.
GENERAL INSURANCE PROBLEMS • WHAT: Product Design • WHERE: Market Characteristics • WHO: Classification & Sale • HOW: Claims Paid • WHEN: Forecasting • WHY: Profit (Expected)
TRADITIONALMATHEMATICAL TECHNIQUES • Arithmetic (Spreadsheets) • Probability & Statistics (Range of Outcomes) • Curve Fitting (Interpolation & Extrapolation) • Model Building (Equations for Processes) • Valuation (Risk, Investments, Catastrophes) • Numerical Method (Analytic Solution Rare)
NON-TRADITIONAL MATHEMATICS • Fuzzy Sets & Fuzzy Logic • Elements: “in/out/partially both” • Logic: “true/false/maybe” • Decisions: “incompatible criteria” • Artificial Intelligence: “data mining” • Neural Networks: “learning algorithms”
CLASSIFICATION • Segmentation: A major exercise for insurance underwriting and claims • Underwriting: Find profitable risks from among the available market • Claims: Sort claims into easy pay and claims needing investigation
Fuzzy Logic Compared with Probability • Probability: • Measures randomness; • Measures whether or not event occurs; and • Randomness dissipates over time or with further knowledge. • Fuzziness: • Measures vagueness in language; • Measures extent to which event occurs; and • Vagueness does not dissipate with time or further knowledge.
Fuzzy Logic Clusters • The field of Pattern Recognition is a search for structure in data. • Old View: Given N objects, divide them into 2 < C < N clusters of homogeneous or similar types. Similarity can be based upon multiple features or criteria but each object is in one and only one cluster. • New View: Objects can be members of one or several clusters with varying strengths of membership; i.e. fuzzy clusters are Fuzzy Sets of clusters. • Example A: Classification of Individual Risks • Example B: Classification of Injury Claims
FUZZY SETS TOWN RATING CLASSIFICATION • When is one Town near another for Auto Insurance Rating? - Geographic Proximity (Traditional) - Overall Cost Index (Massachusetts) • Geographically close Towns do not have the same Expected Losses. • Clusters by Cost Produce Border Problems: Towns between Territories. Fuzzy Clusters acknowledge the Borders. • Are Overall Clusters correct for each Insurance Coverage? • Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering.
Suspicion Centers 6 (A, C, Is, Ij, T, LN) (7,8,7,8,8,0) 5 4 (1,7,0,7,7,0) (1,4,0,4,6,0) 3 FINAL CLUSTERS (0,1,0,1,3,0) 2 Build-up (0,0,0,0,0,0) 1 (Inj. Sus. Planned Fraud Level = Build-up (Inj. Opportunistic to or >5) Valid Sus. Level <5) Fraud 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 CLAIM FEATURE VECTOR ID = Full Member =Partial Member Alpha = .2 Fuzzy Clustering of Fraud Study Claims by Assessment DataMembership Value Cut at 0.2
FRAUD • The Major Questions • What Is Fraud? • How Much Fraud is There? • What Companies Do about Fraud? • How Can We Identify a Fraudulent Claim?
FRAUD DEFINITION Principles • Clear and willful act • Proscribed by law • Obtaining money or value • Under false pretenses Abuse: Fails one or more Principles
Fraud Definition COSTS • Fraud (Criminal, Hard) Small Mass. Auto & WC < 1% • Abuse (Not Criminal, Soft Fraud), BIG Bucks, Depends on Line • “Abuse” is (legally) a gray area, unethical behavior • “Abuse” Containment is a Matter for Company/Industry/Regulator
10% Fraud
FRAUD TYPES • Insurer Fraud • Fraudulent Company • Fraudulent Management • Agent Fraud • No Policy • False Premium • Company Fraud • Embezzlement • Inside/Outside Arrangements • Claim Fraud • Claimant/Insured • Providers/Rings
CLAIM FRAUD INDICATORSVALIDATION PROCEDURES • Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft) • “No one indicator by itself is necessarily suspicious”. • Problem: How to validate the systematic use of Fraud Indicators?
AIB FRAUD INDICATORS 1989 Examples • Accident Characteristics (19) • No report by police officer at scene • No witnesses to accident • Claimant Characteristics (11) • Retained an attorney very quickly • Had a history of previous claims • Insured Driver Characteristics (8) • Had a history of previous claims • Gave address as hotel or P.O. Box
AIB FRAUD INDICATORS 1989 Examples • Injury Characteristics (12) • Injury consisted of strain/sprain only • No objective evidence of injury • Treatment Characteristics (9) • Large number of visits to a chiropractor • DC provided 3 or more modalities on most visits • Lost Wages Characteristics (6) • Claimant worked for self or family member • Employer wage differs from claimed wage loss
REAL PROBLEM • Classify all claims • Identify valid classes • Pay the claim • No hassle • Visa Example • Identify (possible) fraud • Investigation needed • Identify “gray” classes • Minimize with “learning” algorithms
FRAUDULENT CLAIM IDENTIFICATION • Experience and Judgment • Artificial Intelligence Systems • Regression Models • Fuzzy Clusters • Neural Networks • Expert Systems • Genetic Algorithms • All of the Above
POTENTIAL VALUE OF AN ARTIFICIAL INTELLIGENCE SCORING SYSTEM • Screening to Detect Fraud Early • Auditing of Closed Claims to Measure Fraud • Sorting to Select Efficiently among Special Investigative Unit Referrals • Providing Evidence to Support a Denial • Protecting against Bad-Faith
Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud PATRICK L. BROCKETT Gus S. Wortham Chaired Prof. of Risk Management University of Texas at Austin XIAOHUA XIA University of Texas, at Austin RICHARD A. DERRIG Senior Vice President Automobile Insurers Bureau of Massachusetts Vice President of Research Insurance Fraud Bureau of Massachusetts JOURNAL OF RISK AND INSURANCE, 65:2, 245-274, 1998,
NEURAL NETWORKS • Self-Organizing Feature Maps • T. Kohonen 1982-1990 (Cybernetics) • Reference vectors map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2-dimensional square. • Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations
Patterns MAPPING: PATTERNS-TO-UNITS
DATA MODELING EXAMPLE: CLUSTERING • Data on 16,000 Medicaid providers analyzed by unsupervised neural net • Neural network clustered Medicaid providers based on 100+ features • Investigators validated a small set of known fraudulent providers • Visualization tool displays clustering, showing known fraud and abuse • Subset of 100 providers with similar patterns investigated: Hit rate > 70% Cube size proportional to annual Medicaid revenues © 1999 Intelligent Technologies Corporation
Modeling Hidden Exposures in Claim Severity via the EM Algorithm Grzegorz A. Rempala Department of Mathematics University of Louisville and Richard A. Derrig OPAL Consulting LLC & Wharton School, University of Pennsylvania
Hidden Exposures - Overview • Modeling hidden risk exposures as additional dimension(s) of the loss severity distribution • Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice) • Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models • Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms
Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. Left panel: frequency histograms (provider A’s histogram in filled bars). Right panel: density estimators (provider A’s density in dashed line) Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02
Figure 2: EM Fit Left panel: mixture of normal distributions fitted via the EM algorithm to BI data Right panel: Three normal components of the mixture. Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02
Figure 3: Latent risk in BI data modeled by the EM Algorithm with m = 3. Left panel: set of responsibilities δj 3. Right panel: the third component of the normal mixture compared with the distribution of provider A’s claims (“A” claims density estimator is a solid curve) Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 14, 11/18/02
Fraud Classification Using Principal Component Analysis of RIDITs PATRICK L. BROCKETT Gus S. Wortham Chaired Prof. of Risk Management University of Texas at Austin RICHARD A. DERRIG Senior Vice President Automobile Insurers Bureau of Massachusetts Vice President of Research Insurance Fraud Bureau of Massachusetts LINDA L. GOLDEN Marlene & Morton Meyerson Centennial Professor in Business University of Texas Austin, Texas ARNOLD LEVINE Professor Emeritus Department of Mathematics Tulane University New Orleans LA MARK ALPERT Professor of Marketing University of Texas Austin, Texas JOURNAL OF RISK AND INSURANCE, 69:3, SEPT. 2002
THE PROBLEM • Data: Features have no natural metric-scale • Model: Stochastic process has no parametric form • Classification: Inverse image of one dimensional scoring function and decision rule • Feature Value: Identify which features are “important”
PRIDIT METHOD OVERVIEW • 1. DATA: N Claims, T Features, K sub T Responses, Monotone In “Fraud” • 2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero. • 3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells. • 4. SCORE: Sum weights x claim ridit score. • 5. PARTITION: above and below zero.
REFERENCES Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Derrig, R.A. and H.I. Weisberg, [2004], Determinants of Total Compensation for Auto Bodily Injury Liability Under No-Fault: Investigation, Negotiation and the Suspicion of Fraud, Insurance and Risk Management, Volume 71, (4), pp. 633-662. Derrig, R.A., H.I. Weisberg and Xiu Chen, [1994], Behavioral Factors and Lotteries Under No-Fault with a Monetary Threshold: A Study of Massachusetts Automobile Claims, Journal of Risk and Insurance, 61:2, 245-275. Rempala, G. and R.A. Derrig, (2005), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, NAAJ, v9,n2,108-128 Viaene, Stijn, Derrig, Richard A., Dedene, Guido, (2004), A Case Study of Applying Boosting Naïve Bayes to Claim Fraud Diagnosis, IEEE Transactions on Knowledge and Data Engineering, v18,n5, May Viaene, Stijn, Derrig, Richard A., Baesens, Bart, and Dedene, Guido, (2002), A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423.
Fuzzy References Insurance Related Material Brockett, P.L., Cooper, W.W., Golden, L.L. and Pitaktong, V. (1994), A neural network method for obtaining an early warning of insurer solvency, Journal of Risk and Insurance 61, pp. 402-424. Cummins, J.D. and Derrig, R.A. (1997), Fuzzy financial pricing of property‑liability insurance, North American Actuarial Journal, 1:4, pp. 21-44. Cummins, J.D. and Derrig, R.A. (1993), Fuzzy trends in property-liability insurance claim costs, Journal of Risk and Insurance, September, 60, pp. 429-465. Derrig, R.A. and Ostaszewski, K.M. (1995), Fuzzy techniques of pattern recognition in risk and claim classification, Journal of Risk and Insurance,62:3 pp.447-482. Derrig, R.A. and Ostaszewski, K.M. (1997), Managing the tax liability of a propety-liability insurance company, Journal of Risk and Insurance, 64:4, pp.695-711. DeWit, G.W. (1982), Underwriting and uncertainty, Insurance: Mathematics and Economics1, pp. 277-285. Jablonowski, M. (1993), An expert system for retention selection, CPCU Journal, pp. 214-221. Lemaire, Jean (1990), Fuzzy insurance, Astin Bulletin20(1), pp. 33-55. Ostaszewski, K.M. (1993), An Investigation into Possible Applications of Fuzzy Sets Methods in Actuarial Science, Society of Actuaries, Schaumburg, Illinois. Young, V. (1994), Application of fuzzy sets to group health underwriting, Transactions of the Society of Actuaries45, pp. 551-590. Young, V. (1996), Rate changing: A fuzzy logic approach, Journal of Risk and Insurance,63:3, pp.461-484.