250 likes | 259 Views
Explore the development of natural language processing tools for inpatient chronic heart failure quality measures using a case study. Learn about research approvals, data access on VINCI, workflow analysis, and NLP development methods. Discover the Ejection Fraction Annotation Schema and NLP training and testing processes. Summary of development steps, stakeholder engagement, and relevant theoretical frameworks are discussed.
E N D
Automating the Inpatient Chronic Heart Failure Quality Measures in VA Jennifer Garvin PhD, MBA, RHIA, CPHQ, CCS, CTR, FAHIMA Salt Lake City VA Healthcare System IDEAS Center University of Utah 5/16/14 This study is supported by the VA HSR&D IBE- 09-069-1 grant www.wordle.net
Overview • Describe development of natural language processing (NLP) tools • Describe inpatient chronic heart failure (CHF) quality measures • Demonstrate the development of natural language process development tools using the case study of CHF
Decision Support Performance Measures Appropriateness Measures Clinical Guidelines Clinical Processes Formulary Clinical Reminders Applied Use Case Purpose & Background Evidence http://www.healthquality.va.gov/chf/
VA Informatics and Computing Infrastructure- VINCI • Research approvals • Assigned a research folder on VINCI • In VINCI all approved investigators and staff can access data • http://www.hsrd.research.va.gov/for_researchers/vinci/
Preparation for Research • Workflow analysis- visitation and discussion with 2 echo laboratories • Understanding how the documents are developed- variety of approaches • Initial review of document structure to inform our sampling strategy- we planned to oversampled free text and semi-structured text • Paragraph with no outline structure (free text) • Outline with some free text (semi-structured text) • Outline (highly structured text)
EF Sampling Strategy • We had a total of 765 documents available, we needed 367 minimum documents for the test set, and used the remaining 398 documents were available for training. • However, if during system training, the performance of the system reaches the pre-specified level of accuracy without using all available training documents, the remaining unused documents in the training set could be added to the test set. • The 765 documents were randomly assigned to a training and test set in preparation for annotation.
NLP Development Methods- Reference (Gold) Standard Development • All documents in the training and test sets must have an accompanying reference (gold) standard so that the accuracy of the system can be measured during training and testing • Software program called Knowtator was used for annotation • Two independent reviewers with a third adjudicator when disagreement occurred
Ejection Fraction Annotation Schema Classes 1. Ejection Fraction – annotate all mentions of left ventricular ejection fraction. 2. Value – annotate all mentions of the quantitative value associated with left ventricular ejection fraction. 3. Qualitative assessment – annotate all mentions of qualitative assessment of LV ejection fraction and LV systolic function. 4. LV systolic function – annotate all mentions of left ventricular systolic function. • Document level • EF Range (<40%, >=40%, undetermined) • Informativeness: The format of the document was sufficiently predictable that I could skim it rapidly to find what I wanted • Consistency: In order to verify that the information in the document was internally consistent, I found that I had to go “back and forth”
NLP Development Methods- Training and Testing • Training • The system we used was developed but was not “trained” for this specific use case • Separated the training documents into batches of documents and ran the system of a set of the batches • Evaluated false positives and false negatives based on comparison to the reference (gold) standard • Reprogram the system and run against the next batch • When pre-specified level of accuracy reached, measure accuracy at the last iteration • Testing • The system is run on the sequestered documents and the output received by the statistician
Automated Data Acquisition for Heart Failure (ADAHF) Diagram of Overall Classification and Sub-classifications
Summary of Development Steps • Determine a use case • Investigate if there are existing tools that have been used for a given use case • If none, an existing tool may need to be may be generalized or a new one developed • Determine data elements • Determine development environment • Train and Test • Assess accuracy via sensitivity,(recall) specificity, positive predictive value (precision)
Stakeholder EngagementTheoretic Framework and Model • The Promoting Action on Research on Implementation in Health Services (PARIHS) framework1-2 • Evidence • Context • Facilitation • Socio-Technical Model (STM)3Eight Dimensions of which we are using four: • hardware and software • clinical content • workflow and communication • internal organizational features 1Stetler, 2011 http://www.implementationscience.com/content/6/1/992Kitson , 2008 http://www.implementationscience.com/content/3/1/1 3Sittig and Singh , A new sociotechnical model for studying health information technology in complex adaptive healthcare systems, Qual Saf Health Care 2010;19
Stakeholder Engagement: Semi-structured Interview and Thematic Analysis • Approach is “Applied”- to solve a problem6 using a theoretical thematic analysis7 • Two independent reviewers each create summary and organize identify themes to answer research questions • Research group met to develop consensus codes on master themes. • Three documents resulted – two summaries, group- consensus codes , consensus codes with highlighted text 6 Guest et al, Applied Thematic Analysis , 2012 7 Braun et al, Using Thematic Analysis in Psychology , 2006
Stakeholder Interview Results: Respondent Characteristics • We interviewed 13 stakeholders. The interviewees included among others: clinical quality specialists; directors of quality management, clinical analysis and reporting; epidemiology; clinicians and pharmacists; and program analysts • The range for the number of years in VA of respondents is 2-35 • And similarly, the range for the number of years in quality/patient safety is 2-33
Stakeholder Engagement Preliminary Results - Internal Factors • Internal factors that facilitate implementation of an automated system include: • Use of evidence-based care • A culture of continuous quality improvement coupled with measurement and accountability processes • Quality control reporting both within and external to the VA.
Stakeholder Engagement (cont.)Hardware and Software-Preliminary Findings We have an Informatics-Rich Environment in VA • Informatics is used for: • Communication between Providers and Patients • Secure messaging • Blue Button download • Kiosks • Mobile technology • MyHealtheVet • Informatics is used for (cont.): • Clinical Care • CPRS • Clinical Decision Support • Templates designed to facilitate clinically relevant content • Smart forms • CART-CL • Primary Care Almanac
Stakeholder Engagement- Preliminary Results- Informatics Applications Used with Quality Metrics • Quality Improvement Functionality (current) • Performance integrated tracking application (PITA) • Measure Master Report • System extraction and output specifics: • Capture the concepts and values as well as the words around the concepts • Have the ability to adjust the EF value captured • In Development Growing number of informatics tools • Process chart notes using NLP • Surveillance tools • Analytic tools • Meaningful Use • Determine how we could provide data for meaningful use.
Formative Evaluation Process • Initial stakeholder engagement- went through a couple cycles of: • Develop a prototype of the tool with report • Revise based on feedback • Developed a final prototype of the table • Develop an initial functional HMP Module • User-centered Design Analysis
Thank you! Questions or Comments? • Please contact me at Jennifer.garvin@va.gov • This study is undertaken as part of the VA HSR&D IBE- 09-069-1 grant. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs or the University of Utah School of Medicine. • I thank the Department of Veterans Affairs for my fellowship and Gail Graham RHIA and Mark Weiner MD for being my mentors • ADAHF Team: • Julia Heavirland/ Jenifer Williams (Annotators) • Youngjun Kim (Application Specialist) • Stephane Meystre MD, PhD (Faculty System Developer) • Drs. Bruce Bray, Paul Heidenreich, Mary Goldstein, Wendy Chapman, Michael Matheny, Gobbel (Co-investigators) • Andrew Redd PhD and Dan Bolton MS (Statisticians) • Megha Kalsy MS and Natalie Kelly MBA (Stakeholder Engagement) • Jennifer Garvin PhD, MBA (Principal Investigator)
EF Sampling Strategy • To account for clustering, the sample size was increased by the design effect. For ICC = 0.005, Deff = 1 + 25(0.005) = 1.125 so we require 179(1.125) = 201.375 202 positive cases or 29 positive cases per facility; • Dividing by prevalence estimate of EF in documents, we have 29/0.80 = 36.25 37 documents required per facility. We multiply that result by the number of sites and the product is 37(7) = 259 • Doubling the sample size for the three facilities with free- or semi-structured textresulted in a minimum of 367 documents in the test set.
Definitions • Sensitivity is the proportion of patients with disease who test positive. In probability notation: P(T+|D+) = TP / (TP+FN). • Specificity is the proportion of patients without disease who test negative. In probability notation: P(T-|D-) = TN / (TN + FP). • Sensitivity and specificity describe how well the test discriminates between patients with and without disease. They address a different question than we want answered when evaluating a patient, however. What we usually want to know is: given a certain test result, what is the probability of disease? This is the predictive value of the test. Predictive value of a positive test (PPV) is the proportion of patients with positive tests who have disease. In probability notation: (D+|T+) = TP / (TP+FP).
Definitions • The weighted harmonic mean of precision and recall is the F-measure • F= 2*Precision*Recall/(Precision + Recall) • Kappa- more accurate than percent agreement as it accounts for chance agreement • K= Observed agreement + Hypothetical probability of chance agreement/1-hypothetical probability of chance agreement
Definitions • Regular expressions: • A regular expression (regex or regexp for short) is a special text string for describing a search pattern. • www.regular-expressions.info