150 likes | 159 Views
Learn how IMS Health utilizes the largest US healthcare dataset in Hadoop to enable patient-level analytics in near real time. Discover the opportunities, challenges, and lessons learned in making a greater difference in patient healthcare.
E N D
Largest US Healthcare Dataset in Hadoop enables Patient-level Analytics in Near Real Time September 28, 2016 Navdeep AlamDirector of Data Warehousing nalam@us.imshealth.com
Agenda • Who is IMS Health • Health care data ecosystem at IMS • Opportunity and Challenges: Make a Greater Difference in Patient Healthcare • Solution – Anonymous Patient Longitudinal Analysis • Lessons Learned
Health Care Data Ecosystem IMS Health – Where Does Our Data Come From
Future Data Growth is Exponential Social Media, IOT, Genomics Billions More Transactions Billions of Anonymous Patients
Make a Greater Difference in Patient Healthcare Precision Medicine, Better Outcomes, Propel Research towards Cures • Longitudinal Studies • Find Patterns Across All Patients • Predict and Influence Outcomes • Help Reduce Healthcare Costs • Clinical Trials and Drug Research Improvements • Improve Provider Care
Challenges Obstacles to Realizing the Greater Opportunity • Data Silos • Reduced Data Currency • Analytics Away from the Data • Analytics Too Time Consuming and Expensive • Cost High on Current Systems
Solution - Patient Longitudinal Records Organized for Fast Access and Reduced Data Shuffle Traditional Warehoused Data Big Data Factory Each color = Unique de-identified patient ID. Each shape = A type of patient data. Filled shapes = Data of interest Complex Nested Data Type
Solution - Different Storage Engines • Aggregates/Counts • Web Speed (ms) • Faceted Search Storage to Match the Access Pattern Solr Complex Nested Type Web Applications • Fast lookup of longitudinal Entity (i.e. Patient) HBase HUE RDBMS ETL Process Rest • Deep Learning Analytics • Longer Running Queries (min vs. days) Hive Nested Bucketed JDBC/SQL ETL Process • BI/DW Workloads • SQL Hive Partitioned
Hadoop Storage Engines Parquet/Hive vs. HBase vs. Kudu
Evolution of Different Storage Engines Storage to Match the Access Pattern with Kudu Complex Nested Type • Aggregates/Counts • Web Speed (ms) • Faceted Search Solr Web Applications HUE • Fast lookup of longitudinal Entity (i.e. Patient) RDBMS ETL Process Rest • Deep Learning Analytics • Longer Running Queries (min vs. days) JDBC/SQL Kudu • BI/DW Workloads • SQL
Anonymous Patient Longitudinal Analysis Rx (Prescriptions) and Dx (Medical Claims) Longitudinal Analysis
What does this do for us? ValueProposition • See Patterns in Data • Explore the Data Before Analysis • Variety of Analysis in Parallel • Time-to-Value Greatly Increased • Reduced Cost • Innovation
Lessons Learned Technology, Cultural, and Process Management Changes • Rethink Everything!
Thank You Navdeep AlamDirector of Data Warehousing nalam@us.imshealth.com