1 / 35

Identifying RA patients from the electronic medical records at Partners HealthCare

Identifying RA patients from the electronic medical records at Partners HealthCare. Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010. HARVARD MEDICAL SCHOOL. genotype. phenotype. clinical care. genotype. bottleneck. phenotype. clinical care. July 2010: > 30 RA risk loci.

Download Presentation

Identifying RA patients from the electronic medical records at Partners HealthCare

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL SCHOOL

  2. genotype phenotype clinical care

  3. genotype bottleneck phenotype clinical care

  4. July 2010: >30 RA risk loci Together explain ~35% of the genetic burden of disease CD40 CCL21 CD244 IL2RB TNFRSF14 PRKCQ PIP4K2C IL2RAAFF3 REL BLK TAGAP CD28 TRAF6 PTPRC FCGR2A PRDM1 CD2-CD58 IL6ST SPRED2 5q21 RBPJ IRF5 CCR6 PXK TNFAIP3 STAT4 TRAF1-C5 IL2-IL21 HLA DR4 “shared epitope” hypothesis PADI4 PTPN22 CTLA4 2009 2010 (Q2) 1978 1987 2003 2004 2005 2007 2008 Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples

  5. genotype phenotype bottleneck clinical care

  6. Genetic predictors of response to anti-TNF therapy in RA PTPRC/CD45 allele n=1,283 patients P=0.0001 Cui et al (2010) Arth & Rheum

  7. How can we collect DNA and detailed clinical data on >20,000 RA patients?

  8. What are the options for collecting clinical data and DNA for genetic studies?

  9. Options for clinical + DNA

  10. Content of EMRs EMRs are increasingly utilized! • Narrative data = free-form written text • info about symptoms, medical history, medications, exam, impression/plan • Codified data = structured format • age, demographics, and billing codes

  11. This is not a new idea… Sens: 89% PPV: 57% Gabriel (1994) Arthritis and Rheumatism

  12. …but EMR data are “dirty” Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis. Gabriel (1994) Arthritis and Rheumatism

  13. Partners HealthCare: 4 million patients

  14. Partners HealthCare: linked by EMR

  15. Partners HealthCare: organized by i2b2

  16. 4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients

  17. Our library of RA phenotypes Qing Zeng Guergana Savova • Natural language processing (NLP) • disease terms (e.g., RA, lupus) • medications (e.g., methotrexate) • autoantibodies (e.g., CCP, RF) • radiographic erosions • Codified data • ICD9 disease codes • prescription medications • laboratory autoantibodies

  18. Our library of RA phenotypes Shawn Murphy • Natural language processing (NLP) • disease terms (e.g., RA, lupus) • medications (e.g., methotrexate) • autoantibodies (e.g., CCP, RF) • radiographic erosions • Codified data • ICD9 disease codes • prescription medications • laboratory autoantibodies

  19. ‘Optimal’ algorithm to classify RA: NLP + codified data Codified data NLP data Regression model with a penalty parameter (to avoid over-fitting) Tianxi Cai, Kat Liao

  20. High PPV with adequate sensitivity ✪392 out of 400 (98%) had definite or possible RA!

  21. This means more patients! ~25% more subjects with the complete algorithm: 3,585 subjects (3,334 with true RA) 3,046 subjects (2,680 with true RA)

  22. Clinical features of patients CCP has an OR = 1.5 for predicting erosions Liao et (2010) Arth. Care Research

  23. 4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA

  24. Linking the Datamart-Crimson NLP data Codified data

  25. OR similar in EMR cohort 1,500 RA multi-ethnic RA cases and 1,500 matched controls

  26. Genetic risk score also similar

  27. 4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA Clinical subsets

  28. Response to therapy

  29. Non-responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

  30. Responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

  31. Responder to anti-TNF therapy 5-year NIH grant as part of the PharmacoGenomics Research Network (PGRN)

  32. Conclusions

  33. Options for clinical + DNA Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

  34. Options for clinical + DNA Conclusion: Genetic studies in our EMR cohort yield effect sizes similar to traditional cohorts.

  35. Options for clinical + DNA Conclusion: It should be possible to extend this same framework to classify response vs non-response to drugs used to treat RA.

More Related