1 / 32

Knowledge Discovery From Massive Healthcare Claims Data

This presentation by Anatoli Shein discusses the use of big data analytics in healthcare to improve cost-care ratio, reduce fraud, waste, and abuse. It explores the challenges and opportunities in using healthcare claims data for analysis and detection of fraudulent activities.

mccleery
Download Presentation

Knowledge Discovery From Massive Healthcare Claims Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Discovery From Massive Healthcare Claims Data Presented by Anatoli Shein (aus4@pitt.edu) Varun Chandola, Sreenivas Sukumar, Jack Schryver

  2. Motivation: US health care 2008: 15.2% of GDP 2017: 19.5% of GDP Anatoli Shein

  3. Goal: Improve cost-care ratio Anatoli Shein Improve healthcare operations. Reduce fraud, waste, and abuse.

  4. Big Data Analytics in HealthCare Anatoli Shein

  5. Big Data in HealthCare Categorized Anatoli Shein

  6. Data quality and availability Anatoli Shein • Clinical Data, Behavior data, and Pharmaceutical Data: • Useful but unavailable

  7. Data quality and availability Anatoli Shein • Health insurance Data • Available but needs preparation

  8. State of the Art Analytics for Massive HealthCare Data: Anatoli Shein Network analysis Text mining Temporal analysis Higher order feature construction

  9. Health Insurance Anatoli Shein • 85% of Americans have it • It’s data is stored to : • Track payments • Address fraud

  10. Health Insurance Data Model Anatoli Shein Fee-for-service model Provider -> Service -> Patient -> Cost -> Justification -> Payor

  11. Data Maintained for Operation Anatoli Shein Claims information Patient enrollment and eligibility Provider enrollment

  12. Challenges and Opportunities Anatoli Shein Fraud Waste Abuse

  13. Fraud Anatoli Shein Billing for not provided services Large scale fraud

  14. Waste Anatoli Shein • Improper payments • Double payments • Duplicate claims • Outdated fee schedule

  15. Abuse Anatoli Shein Prospective payment system Upcoding

  16. Data Used Anatoli Shein • Claims data (48 million beneficiaries in the US) from transactional data warehouses • Provider enrollment data (from private organizations) • Fraudulent providers (from Office of Inspector General’s exclusion) • The rest are treated as non-fraudulent

  17. Claims Data Anatoli Shein

  18. Analysis Anatoli Shein Identification of typical treatment profiles Identification of costly areas

  19. Text Analysis, profile building Anatoli Shein • Apache Mahout • Hadoop Based technology • Map Reduce

  20. Entities as Documents Anatoli Shein • Document-term matrixes • P(providers) • B(beneficiaries) • C(procedures) • G(diagnoses) • D(drugs) • Ex: PG (providers/diagnoses)

  21. Anatoli Shein

  22. Interesting find Anatoli Shein • Some seemingly different diagnosis codes got grouped to the same topics • Ex: Diabetes and Dermatoses

  23. Social Network Analysis Anatoli Shein Estimate the risk of a provider fraud before making any claims by constructing social network

  24. Provider Network Anatoli Shein

  25. Texas Provider Network Anatoli Shein

  26. Extracting Features from Provider Network Anatoli Shein

  27. Information complexity measure Anatoli Shein • Most distinguishing features showed to be: • Node degree • Number of fraudulent providers in 2-hop network • Eigenvector centrality • Current-flow closeness centrality

  28. Anatoli Shein

  29. Temporal Feature Construction Anatoli Shein By looking at provider data over time we can find anomalies Increase in number of patients Taking patients with conditions different from their past profiles

  30. Fraudulent Provider Detection Anatoli Shein

  31. Conclusions Anatoli Shein Introduced domain of “big” healthcare claims data Analyzed health care claims data on a country level using state of art analytics for massive data Problem was transformed to well known analysis problems in the data mining community Several approaches presented for identifying fraud, waste and abuse

  32. Anatoli Shein Thank you. Questions?

More Related