670 likes | 880 Views
Data mining in Health Insurance. Introduction. Rob Konijn, rob.konijn@achmea.nl VU University Amsterdam Leiden Institute of Advanced Computer Science (LIACS) Achmea Health Insurance Currently working here Delivering leads for other departments to follow up Fraud, abuse
E N D
Introduction • Rob Konijn, rob.konijn@achmea.nl • VU University Amsterdam • Leiden Institute of Advanced Computer Science (LIACS) • Achmea Health Insurance • Currently working here • Delivering leads for other departments to follow up • Fraud, abuse • Research topic keywords: data mining/ unsupervised learning / fraud detection
Outline • Intro Application • Health Insurance • Fraud detection • Part 1: Subgroup discovery • Part 2: Anomaly detection (slides partly by Z. Slavik, VU)
Intro Application • Health Insurance Data • Health Insurance in NL • Obligatory • Only private insurance companies • About 100 euro/month(everyone)+170 euro (income) • Premium increase of 5-12% each year Achmea: about 6 million customers
Funding of Health Insurance Costs in the Netherlands vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds rijksbijdrage verzekerden 18- 2 mld vereveningsbijdrage inkomensafh. bijdrage werkgevers 17 mld 18 mld zorgverzekeraar verzekerde zorgverzekeraar nominale premie 18+: - rekenpremie (~€ 947/vrz): 12 mld - opslag (~€ 150/vrz) : 2 mld 30 mld zorguitgaven
Verevenings-model Mannen Vrouwen 0 - 4 jr 1,400 1,210 • By population characteristics • Age • Gender • Income, social class • Type of work • Calculation afterwards • High costs compensation (>15.000 euro) 5 - 9 jr 1,026 936 10 - 14 jr 907 918 15 - 17 jr 964 1,062 18 - 24 jr 892 1,214 25 - 29 jr 870 1,768 30 - 34 jr 905 1,876 35 - 39 jr 980 1,476 40 - 44 jr 1,044 1,232 45 - 49 jr 1,183 1,366 50 - 54 jr 1,354 1,532 55 - 59 jr 1,639 1,713 60 - 64 jr 1,885 1,905 65 - 69 jr 2,394 2,201 70 - 74 jr 2,826 2,560 75 - 79 jr 3,244 2,886 80 - 84 jr 3,349 3,018 85 - 89 jr 3,424 3,034 90 jr e.o. 3,464 3,014
Introduction Application:The Data • Transactional data • Records of an event • Visit to a medical practitioner • Charged directly by medical practioner • Patient is not involved • Risk of fraud
Transactional Data • Transactions: Facts • Achmea: About 200 mln transactions per year • Info of customers and practitioners: dimensions
Different levels of hierarchy • Records represent events • However, for example for fraud detection, we are interested in customers, or medical practitoners • See examples next pages • Groups of records: Subgroup Discovery • Individual patients/practioners: outlier detection
Different types of fraud hierarchy • On a patient level, or on a hospital level:
Handling different hierarchy • Creating profiles from transactional data • Aggregating costs over a time period • Each record: patient • Each attribute i =1 to n: cost spent on treatment i • Feature construction, for example • The ratio of long/short consults (G.P.) • The ratio of 3-way and 2 way fillings (Dentist) • Usually used for one-way analysis
Different types of fraud detection • Supervised • A labeled fraud set • A labeled non-fraud set • Credit cards, debit cards • Unsupervised • No labels • Health Insurance, Cargo, telecom, tax etc.
Unsupervised learning in Health Insurance Data • Anomaly Detection (outlier detection) • Finding individual deviating points • Subgroup Discovery • Finding (descriptions of) deviating groups • Focus on differences and uncommon behavior • In contrast to other unsupervised learning methods • Clustering • Frequent Pattern mining
Subgroup Discovery • Goal: Find differences in claim behavior of medical practitioners • To detect inefficient claim behavior • Actions: • A visit from the account manager • To include in contract negotiations • In the extreme case: fraud • Investigation by the fraud detection department • By describing deviations of a practitioner from its peers • Subgroups
Patient-level, Subgroup Discovery • Subgroup (orange): group of patients • Target (red) • Indicates whether a patient visited a practitioner (1), or not (0)
Subgroup Discovery: Quality Measures • Target Dentist: 1672 patiënten • Compare with peer group, 100.000 patients in total • Subgroup V11 > 42 euro : 10347 patients • V11: one sided filling • Crosstable
The cross table • Cross table in data • Cross table expected: • Assuming independence
Calculating Wracc and Lift • Size subgroup = P(S) = 0.10347, size target dentist = P(T) = 0.01672 • Weighted Relative ACCuracy (WRAcc) = P(ST) – P(S)P(T) = (871 – 173)/100000 = 689/100000 • Lift = P(ST)/P(S)P(T) = 871/173 = 5.03
Making SD more useful: adding prior knowledge • Adding prior knowledge • Background variables patient (age, gender, etc.) • Specialism practitioner • For dentistry: choice of insurance • Adding already known differences • Already detected by domain experts themselves • Already detected during a previous data mining run
The idea: create an expected cross table using prior knowledge
Quality Measures • Ratio (Lift) • Difference (WRAcc) • Squared sum (Chi-square statistic)
Example, iterative approach • Idea: add subgroup to prior knowledge iteratively • Target = single pharmacy • Patients that visited the hospital in last 3 years removed from data • Compare with peer group (400,000 patients), 2929 patiënts of target pharmacy • Top subgroup : “B03XA01 (Erythropoietin)>0 euro” 1 ‘target’ pharmacy rest subgroup B03XA01 > 0 rest
Next iteration • Add “B03XA01 (EPO) >0 euro” to prior knowledge • Next best subgroup: “N05AX08 (Risperdal)>= 500 euro”
Figure describing subgroup:N05AX08 > 500 Left: target pharmacy, right: other pharmacies
Addition: adding costs to quality measure • M55: dental cleaning • V11: 1-way filling • V21: polishing • Cost of treatments in subgroup 370 euro (average) • 791 more patients than expected • Total quality 791*370 = 292,469 euro
Iterative approach, top 3 subgroups • V12: 2-sided filling • V21: polishing • V60: indirect pulpa covering • V21 and V60 are not allowed on the same day • Claim back (from all dentists): 1.3 million euro
Other target types: double binary target • Target 1: year: 2009 or 2008 • Target 2: target practitioner • Pattern: • M59: extensive (expensive) dental cleaning • C12: second consult in one year • Crosstable:
Other target types: Multiclass target • Subgroup (orange): group of patients • Target (red), now is a multi-value column, one value per dentist
Anemaly Detection The exampleabovecontains a contextualanomaly...
Outline Anomaly Detection • Anomalies • Definition • Types • Technique categories • Examples • Lecture based on • Chandola et al. (2009). Anomaly Detection: A Survey • Paper in BB 38
Definition • “Anomaly detection refers to the problem of finding patternsin data that do not conform to expected behavior” • Anomalies, aka. • Outliers • Discordant observations • Exceptions • Aberrations • Surprises • Peculiarities • Contaminants
Anomaly types Point anomalies • A data point is anomalous with respect to the rest of the data
Not covered today • Other types of anomalies: • Collective anomalies • Contextual anomalies • Other detection approaches: • Supervised learning • Semi supervised • Assume training data is from normal class • Use to detect anomalies in the future
We focus on outlier scores • Scores • You get a ranked list of anomalies • “We investigate the top 10” • “An anomaly has a score of at least 134” • Leads followed by fraud investigators • Labels ANOMALY
Detectionmethodcategorisation • Model based • Depth based • Distance Based • Information theory related (not covered) • Spectral theory related (not covered)
Model based • Build a (statistical) model of the data • Data instances occur in high probability regions of a stochastic model, while anomalies occur in low probability regions • Or: data instances have a high distance to the model are outliers • Or: data instances have a high influence on the model are outliers
Example: one way outlier detection • Pharmacy records • Records represent patients • One attribute at a time: • This example: attribute describing the costs spent on fertility medication (gonodatropin) in a year • We could use such one way detection for each attribute in the data
Example, model = non-parametric distribution • Left: kernel density estimate • Right: boxplot
Other models possible • Probabilistic • Bayesian networks • Regression models • Regression trees/ random forests • Neural networks • Outlier score = prediction error (residual)
Depth based methods • Applied on 1-4 dimensional datasets • Or 1-4 attributes at a time • Objects that have a high distance to the “center of the data” are considered outliers • Example Pharmacy: • Records represent patients • 2 attributes: • Costs spent on diabetes medication • Costs spent on diabetes testing material