1 / 29

Information Extraction Group Health

Information Extraction Group Health. David Carrell, PhD Group Health Research Institute June 29, 2010. David’s background. Group Health Research Institute (GHRI). Group Health (www.ghc.org) Founded 1947, Seattle, WA Integrated delivery system (“HMO”) ~600K patients in WA (some OR, ID)

shing
Download Presentation

Information Extraction Group Health

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information ExtractionGroup Health David Carrell, PhDGroup Health Research InstituteJune 29, 2010

  2. David’s background ...

  3. Group Health Research Institute (GHRI) • Group Health (www.ghc.org) • Founded 1947, Seattle, WA • Integrated delivery system (“HMO”) • ~600K patients in WA (some OR, ID) • Comprehensive EMR & patient portal (2004+) • GHRI (www.grouphealthresearch.org) • Founded 1983 • 300 staff (50 investigators) • 2009: >250 active grants ($39M)

  4. Group Health Research Institute (GHRI) • Applied research • Epidemiology, health systems, clinical trials, economics ... • Limited bio-informatics expertise • Collaborative • HMO-Research Network, Cancer-RN, ... MH-RN • Federated data systems • NLP vision • NLP expertise through collaboration • Bring NLP to the text—locally ... other network sites

  5. GHRI & Research Consortia HMO Research Network • Large data repositories • Common EMR platforms • Virtual Data Warehouse (VDW)

  6. Census Demographics Tumor Pharmacy Vital Signs Enrollment Encounters MRN MRN MRN MRN MRN MRN MRN block provider measure_date ndc birth_date enr_start dxdate blockgp adate ht rxdate enr_end gender staging vars… county enctype wt rxsup race1-5 ins_medicare tumor vars… state bmi ddate rxamt ins_medicaid hispanic treatment vars… tract days_diff encounter_subtype rxmd ins_commercial etc. zip diastolic facility_code ins_privatepay education vars... systolic discharge_disposition ins_other income var... position discharge_status drugcov race vars... DRG admitting_source NDC Procedures department ndc MRN Provider GenericName provider BrandName adate Provider enctype Specialty Diagnoses px codetype MRN provider performingprovider pxcnt adate origpx enctype dx pdx diagprovider origdx GHRI & Virtual Data Warehouse (VDW) • Structured data (legacy + Epic/EMR) • Minimum 1990+ • Integrated care delivery (some claims) • Diagnoses, procedures, pharmacy, tumor, vitals, census/geocode, etc.

  7. GHRI & Virtual Data Warehouse (VDW) HMO Research Network

  8. Census Demographics Tumor Pharmacy Vital Signs Enrollment Encounters MRN MRN MRN MRN MRN MRN MRN block provider measure_date ndc birth_date enr_start dxdate blockgp adate ht rxdate enr_end gender staging vars… county enctype wt rxsup race1-5 ins_medicare tumor vars… state bmi ddate rxamt ins_medicaid hispanic treatment vars… tract days_diff encounter_subtype rxmd ins_commercial etc. zip diastolic facility_code ins_privatepay education vars... systolic discharge_disposition ins_other income var... position discharge_status drugcov race vars... DRG admitting_source NDC Procedures department ndc MRN Provider GenericName provider BrandName adate Provider enctype Specialty Diagnoses px codetype MRN Structured Information from Text provider performingprovider pxcnt adate origpx enctype Pathology dx MRN pdx Imaging accession_number diagprovider collection_date MRN origdx coding_date image_number thesaurus_version image_date provider coding_date thesaurus_version Imaging Clinical Notes Concepts Concepts Pathology image_number MRN Concepts provider concept_code adate accession_number concept_type enctype negated concept_code concept_code concept_type concept_type negated negated GHRI & NLP Adoption

  9. GHRI & NLP Adoption HMO Research Network

  10. GHRI & NLP Adoption • caBIG TBPT adoption proposal, Jun 2006 • caTIES for pathology & radiology text, ~2007 • Chart note text, May 2007 • GWAS (eMERGE) proposal, Aug 2007 • GATE experimentation, Feb 2008 • Strategic planning conference, Dec 2008 • ARRA Challenge Grant, Apr 2009 • UIMA/cTAKES adoption, Aug 2009 • Proposals... e.g.,HMORN multi-site, Jan 2010

  11. GHRI & NLP Adoption • How to bring NLP capacity to clinical text? • “Cookbooks” (SAS  Java programmers) • “Parachuted” hardware • Parachuted virtual machine (?) • Cloud-based processing • Security issues • Other?

  12. GHRI & NLP Adoption

  13. GHRI & NLP Adoption • Challenges of Cloud-based Solutions: • Unfamiliar technologies • Responsibility sharing (e.g., security) • Patient privacy • Institutional risk • De-identification • Graduated adoption?

  14. SHARP -- Exploring deployment strategies • SHARP Cloud Security Workshop • Spring 2011 • Educational focus • Challenges of processing clinical text in a novel security space (virtual firewall?) • Security best practices • IRB engagement • Graduated adoption strategies

  15. NLP Challenge Grant Natural Language Processing for Cancer Research Network Surveillance Studies • Aim 1:Deploy open-source NLP softwareDevelop ETL connective tissueBuild “human capital” (Java, NLP) • Aim 2:NLP algorithm boot camp: Recurrent breast cancer diagnoses>3000 existing gold standard cases (human reviewed) • Approach:Local deployment/programming supportHigh-level NLP/bioinformatics expertise via external collaboration • Participants:GHRI (Carrell, Buist, Chubak), Mayo Clinic/Harvard (Savova), Pittsburgh (Chapman), Vanderbilt (Xu).

  16. NLP Challenge Grant – Aim 1

  17. NLP Challenge Grant – Aim 1

  18. NLP Challenge Grant – Aim 2

  19. NLP Challenge Grant – Aim 2

  20. AE1 AE2 AE3 AE1 AE2 AE1 AE2 AE3 AE1 Progress Notes Oncology Notes Radiology Reports Pathology Reports NLP Challenge Grant – Aim 2 Rec Br Ca?

  21. eMERGE consortium • Vanderbilt, Mayo, Northwestern, Marshfield, Group Health • Can EMRs from multiple institutions provide comparable phenotype data for GWAS? • 14 phenotypes • Group Health • structured data • Adoption of NLP algorithms developed by others • “Low-tech” NLP • Text explorer, Assisted chart abstraction

  22. Search terms highlighted Clinical Text Explorer Select text source (chart notes, radiology, pathology, etc.) Search: recurrent NEAR breast NEAR cancer. Date range Sample spec’s N documents,N patients found

  23. Assisted Chart Abstraction

  24. Assisted Chart Abstraction Data Indexes A-Z Full-text ID A-Z A-Z Date A-Z Etc. SQL Server CohortLists Chart notes • 550K pts • 17M notes • 0.8B lines Chart notes • 550K pts • 17M notes • 0.8B lines AssistedChartAbstractionGUI DataWarehouse NLPConceptCodes • Outside EMR • Pre-processed • Point-and-click • Text capture

  25. Note Date Note By Pt Demog Note Type Pt Visits Note Text Pt Dx/Px/Rx Selection criteria applied to the patient Selection criteria applied to the notes Assign note priority Assisted Chart Abstraction Data Traditional chart abstraction Assisted chart abstraction Assisted Chart Abstraction Identify Cohort

  26. Stage Patients Chart Notes 2903(100%) Initial cohort identification: 137,019 (100%) Inclusion criteria (demog., dx, px, etc.): 671(23%) 70,119 (51%) 228(8%) 28,186 (21%) Electronic text: 122(4%) Pre-processed text: 284 (0.2 %) Assisted Chart Abstraction • Text: “CATARACT” • Note: Op/Ophthal exam • Near: Cataract procedure

  27. Potential SHARP synergy ... National Cancer Institute FOA:Tools for Electronic Data Extraction • Funding:NCI Contract for software development • Aim:Enhance/automate existing SEER cancer case identification (largely manual abstraction of EHR/paper charts) • Approach:Assess, propose, test, modify, develop, deploy technologies that leverage NLP to automate some aspects of SEER workflow • Participants:IMS, Inc., SEER sites (4), Group Health, Harvard

  28. SHARP – NLP research lab

  29. Questions – Discussion

More Related