1 / 64

Square wheels: electronic medical records for discovery research in rheumatoid arthritis

Square wheels: electronic medical records for discovery research in rheumatoid arthritis. ^ genetic. Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored " Using EHR Data for Discovery Research ". HARVARD MEDICAL SCHOOL. Key questions.

Download Presentation

Square wheels: electronic medical records for discovery research in rheumatoid arthritis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Square wheels: electronic medical records for discovery research in rheumatoid arthritis ^ genetic Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored "Using EHR Data for Discovery Research" HARVARD MEDICAL SCHOOL

  2. Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

  3. Key questions How can I implement your approach, and how much better is it?

  4. genotype phenotype clinical care

  5. genotype bottleneck phenotype clinical care

  6. October 2009: >30 RA risk loci Together explain ~35% of the genetic burden of disease REL BLK TAGAP CD28 TRAF6 PTPRC FCGR2A PRDM1 CD2-CD58 CD40 CCL21 CD244 IL2RB TNFRSF14 PRKCQ PIP4K2C IL2RAAFF3 TNFAIP3 STAT4 TRAF1-C5 IL2-IL21 HLA DR4 “shared epitope” hypothesis PADI4 PTPN22 CTLA4 2009 1978 1987 2003 2004 2005 2007 2008 Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new loci Raychaudhuri et al in press Nature Genetics

  7. genotype phenotype bottleneck clinical care

  8. Genetic predictors of response to anti-TNF therapy in RA PTPRC/CD45 allele n=1,283 patients P=0.0001 Submitted to Arth & Rheum

  9. How can we collect DNA and detailed clinical data on >20,000 RA patients?

  10. What are the options for collecting clinical data and DNA for genetic studies?

  11. Options for clinical + DNA

  12. Content of EMRs EMRs are increasingly utilized! • Narrative data = free-form written text • info about symptoms, medical history, medications, exam, impression/plan • Codified data = structured format • age, demographics, and billing codes

  13. This is not a new idea… Sens: 89% PPV: 57% Gabriel (1994) Arthritis and Rheumatism

  14. …but EMR data are “dirty” Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis. Gabriel (1994) Arthritis and Rheumatism

  15. Partners HealthCare: 4 million patients

  16. Partners HealthCare: linked by EMR

  17. Partners HealthCare: organized by i2b2

  18. 4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA Clinical subsets

  19. Our library of RA phenotypes Qing Zeng • Natural language processing (NLP) • disease terms (e.g., RA, lupus) • medications (e.g., methotrexate) • autoantibodies (e.g., CCP, RF) • radiographic erosions • Codified data • ICD9 disease codes • prescription medications • laboratory autoantibodies

  20. Our library of RA phenotypes Shawn Murphy • Natural language processing (NLP) • disease terms (e.g., RA, lupus) • medications (e.g., methotrexate) • autoantibodies (e.g., CCP, RF) • radiographic erosions • Codified data • ICD9 disease codes • prescription medications • laboratory autoantibodies

  21. ‘Optimal’ algorithm to classify RA: NLP + codified data Codified data NLP data Regression model with a penalty parameter (to avoid over-fitting) Tianxi Cai, Kat Liao

  22. High PPV with adequate sensitivity ✪392 out of 400 (98%) had definite or possible RA!

  23. This means more patients! ~25% more subjects with the complete algorithm: 3,585 subjects (3,334 with true RA) 3,046 subjects (2,680 with true RA)

  24. 4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA

  25. Linking the Datamart-Crimson NLP data Codified data

  26. Status of i2b2 Crimson collection genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at Broad Institute • Over 3,000 samples collected to date • cost = $10 per sample • DNA extracted on >2,400 Buffy coats • cost = $20 per sample • >90% had ≥1 ug of DNA • >99% had ≥5 ug of DNA after WGA

  27. Status of i2b2 Crimson collection stay tuned…more data soon! • Measured autoantibodies from plasma • 5 autoantibodies in ~380 RA patients • ~85% are CCP+, ~35% ANA+, ~15% TPO+ • Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls?

  28. Key questions How can I implement your approach, and how much better is it?

  29. Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

  30. Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

  31. Regulatory obstacles IRB approval De-identified vs truly anonymous Open question: sharing of genetic data

  32. Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

  33. Resources required • Building a research DataMart • clinical EMR ≠ research EMR • multiple FTE’s to build/maintain • NLP expertise • open-source software available • iterative process for fine-tuning • Clinical expertise • understand nature of clinical data

  34. Resources required (cont.) • Statistical expertise • simple algorithm is not sufficient • prepare for the unexpected! • true for narrative and codified • Biospecimen collection, DNA extraction • varies by institution • Crimson • Broad Institute

  35. Key questions What are the regulatory obstacles impacting your work? What are the resource needs required to replicate your work at other institutions? What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

  36. 4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA Clinical subsets

  37. Clinical features of patients CCP has an OR = 1.5 for predicting erosions

  38. Subset patients in clinically meaningful ways: causes of mortality NLP+codified data, together with statistical modeling, to define cardiovascular disease

  39. Non-responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

  40. Responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response

  41. Post-marketing surveillance of adverse events pharmacovigilance NLP+codified data, together with statistical modeling, to define treatment response

  42. Conclusions

  43. Options for clinical + DNA Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

  44. Options for clinical + DNA Conclusion: We can collect DNA and plasma in a high-throughput manner.

  45. Options for clinical + DNA Conclusion: The cost is reasonable...even for >20,000 RA patients!

  46. genotype phenotype clinical care

  47. Acknowledgments Zak Kohane Susanne Churchill Vivian Gainer Kat Liao Tianxi Cai Shawn Murphy Qing Zing Soumya Raychaudhuri Beth Karlson Pete Szolovits Lee-Jen Wei Lynn Bry (Crimson) Sergey Goryachev Barbara Mawn & many others ! Namaste!

  48. Narrative data (NLP text extractions) Codified data (ICD9 codes, etc)

More Related