350 likes | 478 Views
Identifying RA patients from the electronic medical records at Partners HealthCare. Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010. HARVARD MEDICAL SCHOOL. genotype. phenotype. clinical care. genotype. bottleneck. phenotype. clinical care. July 2010: > 30 RA risk loci.
E N D
Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL SCHOOL
genotype phenotype clinical care
genotype bottleneck phenotype clinical care
July 2010: >30 RA risk loci Together explain ~35% of the genetic burden of disease CD40 CCL21 CD244 IL2RB TNFRSF14 PRKCQ PIP4K2C IL2RAAFF3 REL BLK TAGAP CD28 TRAF6 PTPRC FCGR2A PRDM1 CD2-CD58 IL6ST SPRED2 5q21 RBPJ IRF5 CCR6 PXK TNFAIP3 STAT4 TRAF1-C5 IL2-IL21 HLA DR4 “shared epitope” hypothesis PADI4 PTPN22 CTLA4 2009 2010 (Q2) 1978 1987 2003 2004 2005 2007 2008 Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples
genotype phenotype bottleneck clinical care
Genetic predictors of response to anti-TNF therapy in RA PTPRC/CD45 allele n=1,283 patients P=0.0001 Cui et al (2010) Arth & Rheum
How can we collect DNA and detailed clinical data on >20,000 RA patients?
What are the options for collecting clinical data and DNA for genetic studies?
Content of EMRs EMRs are increasingly utilized! • Narrative data = free-form written text • info about symptoms, medical history, medications, exam, impression/plan • Codified data = structured format • age, demographics, and billing codes
This is not a new idea… Sens: 89% PPV: 57% Gabriel (1994) Arthritis and Rheumatism
…but EMR data are “dirty” Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis. Gabriel (1994) Arthritis and Rheumatism
4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients
Our library of RA phenotypes Qing Zeng Guergana Savova • Natural language processing (NLP) • disease terms (e.g., RA, lupus) • medications (e.g., methotrexate) • autoantibodies (e.g., CCP, RF) • radiographic erosions • Codified data • ICD9 disease codes • prescription medications • laboratory autoantibodies
Our library of RA phenotypes Shawn Murphy • Natural language processing (NLP) • disease terms (e.g., RA, lupus) • medications (e.g., methotrexate) • autoantibodies (e.g., CCP, RF) • radiographic erosions • Codified data • ICD9 disease codes • prescription medications • laboratory autoantibodies
‘Optimal’ algorithm to classify RA: NLP + codified data Codified data NLP data Regression model with a penalty parameter (to avoid over-fitting) Tianxi Cai, Kat Liao
High PPV with adequate sensitivity ✪392 out of 400 (98%) had definite or possible RA!
This means more patients! ~25% more subjects with the complete algorithm: 3,585 subjects (3,334 with true RA) 3,046 subjects (2,680 with true RA)
Clinical features of patients CCP has an OR = 1.5 for predicting erosions Liao et (2010) Arth. Care Research
4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA
Linking the Datamart-Crimson NLP data Codified data
OR similar in EMR cohort 1,500 RA multi-ethnic RA cases and 1,500 matched controls
4 million patients ICD9 RA and/or CCP checked (goal = high sensitivity) 31,171 patients Classification algorithm (goal = high PPV) 3,585 RA patients Discarded blood for DNA Clinical subsets
Non-responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response
Responder to anti-TNF therapy NLP+codified data, together with statistical modeling, to define treatment response
Responder to anti-TNF therapy 5-year NIH grant as part of the PharmacoGenomics Research Network (PGRN)
Options for clinical + DNA Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.
Options for clinical + DNA Conclusion: Genetic studies in our EMR cohort yield effect sizes similar to traditional cohorts.
Options for clinical + DNA Conclusion: It should be possible to extend this same framework to classify response vs non-response to drugs used to treat RA.