1 / 18

Herding Ponies: How big data methods facilitate collaborative analytics

Herding Ponies: How big data methods facilitate collaborative analytics. Changes in Outcomes Research. New monikers… Patient Centered Outcomes Research Health Services Research Comparative Effectiveness Research Safety and Surveillance Changes in funding agencies PCORI - AHRQ FDA – CMS

luke-rivera
Download Presentation

Herding Ponies: How big data methods facilitate collaborative analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Herding Ponies: How big data methods facilitate collaborative analytics

  2. Changes in Outcomes Research • New monikers… • Patient Centered Outcomes Research • Health Services Research • Comparative Effectiveness Research • Safety and Surveillance • Changes in funding agencies • PCORI - AHRQ • FDA – CMS • NIH • Changes in research models • More multi-site studies • Larger “center-based” studies • Greater interest in Patient Generated Data • Greater interest in EHR-based data • Less interest in claims

  3. Collaboration Frameworks From other disciplines • Open Science Grid Physics, nanotechnolgy, structural biology • OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, • >260 pubs in 2010 • LIGOPhysics/Astrophysics • Established practices and metadata standards • 1 PB data in last science run, distributed worldwide • ESGF • 1.2 PB climate data • delivered to 23,000 users; 600+ pubs • Collage – Executable papersComputer science

  4. “Why hasn’t Outcomes Research adopted collaborative methods used in physics, climate science, and genomics?” - Everyone in data-driven research

  5. Adapting to Collaborative Science • Healthcare data are not collected for research • Not standardized • Not complete • Privacy protection has legal and ethical implications • Data is an asset • Data sharing is not incentivized supported by journals, funding agencies, or the business of healthcare • Obtaining consent is expensive • Data hoarding is rewarded and conservative

  6. Are Federated Research Networks the solution? In federated models data are not centralized. AHRQ and PCORI have invested heavily this approach. • Each data holder independently assumes responsibility for “data wrangling” and standardization • Requires distributed analysis as opposed to traditional central data pooling and analysis. • If data are simply used to independently estimate one model per site, value-added for causal inference is similar to a meta-analysis • Requires greater levels of coordination of governance, standards, software, and policies. • High barriers to entry – what is the ROI?

  7. Federated Meta-Analysis vs. Distributed Analysis • Meta-analysis • 1 Independently estimated model for each node in the network • Not iterative • Distributed Analysis • One jointly estimated model using data from all sites • Typically iterative • Leverages computational power of the entire network

  8. What does this have to do with “big data?”

  9. Two (of 8) barriers to collaborative data science solved with “Big Data” methods • Privacy protection has legal and ethical implications • If data are simply used to independently estimate one model per site, value-added for causal inference is similar to a meta-analysis • Bonus – specialized software or hardware like SAS and CMS repositories can be replaced with parallelized systems

  10. Parallel Evolution of Distributed Computing and Federated Research Networks

  11. “Big Data” Analytics vs. Outcomes Research Analytics

  12. “Big-Data” Methods are Incidentally “Privacy Preserving”

  13. Distributed Computing Frameworks • Grid Computing Architectures • Statistical Query OracleMostly an academic effort • HadoopFrom Google • Hundreds of developers • 591 Active projects and organizations • Apache SparkBerkeley Computer Science answer to Hadoop • Most rapidly growing user base • 99 Active projects and organizations

  14. Collaboration Frameworks In Outcomes Research • SHRINE for I2B2 • PopMedNet – for MiniSentinel, PCORnet • TRIAD for CAGrid, SAFTINet DRN

  15. What distributed methods in the standard biostats toolbox are already supported in “Big Data” vs. Clinical Frameworks?

  16. No Longer a Technical Challenge We have the tools we need to overcome privacy and liability concerns. Now we “only” need to change culture.

  17. Moving Collaborative Outcomes Science Forward • Policies (aka incentives) • Payer-driven incentives for better data hygiene and standardization • Payer incentives for sharing • Funding agency incentives for collaborative data management vs. data hoarding • Journal incentives • HIPAA Clarification • Infrastructure • As a community - adopt existing easy-to-use, flexible platforms for sharing code and data • Link clinical data and patient device infrastructure to research infrastructure • Culture • Clinician demand • Patient demand • Tenure and promotion transformation • Replace “not invented here syndrome” with collective credit and shared efficiencies

More Related