320 likes | 332 Views
This presentation by Anatoli Shein discusses the use of big data analytics in healthcare to improve cost-care ratio, reduce fraud, waste, and abuse. It explores the challenges and opportunities in using healthcare claims data for analysis and detection of fraudulent activities.
E N D
Knowledge Discovery From Massive Healthcare Claims Data Presented by Anatoli Shein (aus4@pitt.edu) Varun Chandola, Sreenivas Sukumar, Jack Schryver
Motivation: US health care 2008: 15.2% of GDP 2017: 19.5% of GDP Anatoli Shein
Goal: Improve cost-care ratio Anatoli Shein Improve healthcare operations. Reduce fraud, waste, and abuse.
Big Data Analytics in HealthCare Anatoli Shein
Big Data in HealthCare Categorized Anatoli Shein
Data quality and availability Anatoli Shein • Clinical Data, Behavior data, and Pharmaceutical Data: • Useful but unavailable
Data quality and availability Anatoli Shein • Health insurance Data • Available but needs preparation
State of the Art Analytics for Massive HealthCare Data: Anatoli Shein Network analysis Text mining Temporal analysis Higher order feature construction
Health Insurance Anatoli Shein • 85% of Americans have it • It’s data is stored to : • Track payments • Address fraud
Health Insurance Data Model Anatoli Shein Fee-for-service model Provider -> Service -> Patient -> Cost -> Justification -> Payor
Data Maintained for Operation Anatoli Shein Claims information Patient enrollment and eligibility Provider enrollment
Challenges and Opportunities Anatoli Shein Fraud Waste Abuse
Fraud Anatoli Shein Billing for not provided services Large scale fraud
Waste Anatoli Shein • Improper payments • Double payments • Duplicate claims • Outdated fee schedule
Abuse Anatoli Shein Prospective payment system Upcoding
Data Used Anatoli Shein • Claims data (48 million beneficiaries in the US) from transactional data warehouses • Provider enrollment data (from private organizations) • Fraudulent providers (from Office of Inspector General’s exclusion) • The rest are treated as non-fraudulent
Claims Data Anatoli Shein
Analysis Anatoli Shein Identification of typical treatment profiles Identification of costly areas
Text Analysis, profile building Anatoli Shein • Apache Mahout • Hadoop Based technology • Map Reduce
Entities as Documents Anatoli Shein • Document-term matrixes • P(providers) • B(beneficiaries) • C(procedures) • G(diagnoses) • D(drugs) • Ex: PG (providers/diagnoses)
Interesting find Anatoli Shein • Some seemingly different diagnosis codes got grouped to the same topics • Ex: Diabetes and Dermatoses
Social Network Analysis Anatoli Shein Estimate the risk of a provider fraud before making any claims by constructing social network
Provider Network Anatoli Shein
Texas Provider Network Anatoli Shein
Extracting Features from Provider Network Anatoli Shein
Information complexity measure Anatoli Shein • Most distinguishing features showed to be: • Node degree • Number of fraudulent providers in 2-hop network • Eigenvector centrality • Current-flow closeness centrality
Temporal Feature Construction Anatoli Shein By looking at provider data over time we can find anomalies Increase in number of patients Taking patients with conditions different from their past profiles
Fraudulent Provider Detection Anatoli Shein
Conclusions Anatoli Shein Introduced domain of “big” healthcare claims data Analyzed health care claims data on a country level using state of art analytics for massive data Problem was transformed to well known analysis problems in the data mining community Several approaches presented for identifying fraud, waste and abuse
Anatoli Shein Thank you. Questions?