230 likes | 403 Views
Automatic Formalization of Clinical Practice Guidelines. Matthew S. Gerber and Donald E. Brown Department of Systems and Information Engineering University of Virginia. James H. Harrison Department of Public Health Sciences University of Virginia. Clinical Practice Guidelines.
E N D
Automatic Formalization of Clinical Practice Guidelines Matthew S. Gerber and Donald E. Brown Department of Systems and Information Engineering University of Virginia James H. Harrison Department of Public Health Sciences University of Virginia
Clinical Practice Guidelines • Many treatment options – what to do? Strength Recommended Randomized clinical trial: beneficial Benefits / costs Should consider Meta-analysis: usually beneficial Might consider Expert opinion: might be beneficial Evidence quality
Clinical Practice Guidelines • Development • Expert synthesis of current evidence • Example from heart failure:
Clinical Practice Guidelines • Expected outcomes • Evidence-based clinical decision aid • Reduction in cost and treatment/outcome variation • Improvement in patient health • Challenges • A guideline for any occasion • Guidelines change periodically • Lengthy (HFSA CPG is 259 pages)
Clinical Decision Support Systems • Goal: deliver CPG knowledge at point of care • Alleviate burden on clinician • Problem: CPGs contain minimally structured text Formalization is required
Traditional CPG Formalization Knowledge representation CPG Knowledge engineers Medical experts Knowledge management software (e.g., Protégé) Automatic formalization CDSS
The Big Picture Endocrine Infections … Cardiovascular NLP ? Medical decision support Structured knowledge Retrospective analyses …
Data Collection • Yale Guideline Recommendation Corpus • Hussain et al. (2009) • 1,275 recommendations • Representative sample of domains and rec. types “Oral antiviral drugs are indicated within 5 days of the start of the episode and while new lesions are still forming.” • Simplifications • Delimited recommendations • No inter-recommendation dependencies • Random sub-sample of YGRC (n=200)
Recommendation Representation Fidelity: Low High • SNOMED-CT • Medical concept ontology • Broad coverage Keywords ? Asbru, etc. Automation: Trivial Impossible
Recommendation Representation (Sundvalls et al., 2012)
Recommendation Representation SNOMED-CT CONCEPT: 129265001
Recommendation Annotation • Task: manually identify representational elements within recommendations • Example Diuretics are recommended for patients with heart failure. [DRUG Diuretics] are recommended for [POPULATION patients with [MORBIDITY heart failure]].
Methods • Natural language processing • Supervised classification • Per-recommendation pipeline • Syntactic parsing • Parse node classification • Post-processing
Methods: (1) Syntactic Parsing • Constituency parser (Charniak and Johnson, 2005)
Methods: (2) Parse Node Classification • Unit of classification: node • Multi-class logistic regression • Example: 1 positive, 17 negative • Actual • 12K nodes • 10 classes (primary)
Methods: (2) Parse Node Classification • Linguistic features • Word stems under node • Syntactic configuration of node • …
Methods: (2) Parse Node Classification • Learning • Forward feature selection • Per-class costs (LibLinear)
Methods: (3) Post-processing • Remove duplicates • Other possible issues • Conflicts • Embedding
Evaluation Results • 10-fold cross-validation
Discussion • High variance across classes • Alternative strategies • Identify more informative features • Change the model formulation • Annotate more data
Conclusions • CPGs are an important knowledge source • Difficult to use within CDSS • Prior CPG formalization • Manual • Automatic for specific domains / recommendations • Our contributions • SNOMED-CT representation • Manually annotated recommendation sample • Statistical NLP model / evaluation
Future Work • Refined representation • Model formulation • Feature engineering • Controlled natural language
Questions? • References • Charniak, E. & Johnson, M. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, 173-180. • Hussain, T.; Michel, G. & Shiffman, R. N. The Yale Guideline Recommendation Corpus: A representative sample of the knowledge content of guidelines. I. J. Medical Informatics, 2009, 78, 354-363. • Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R. & Lin, C.-J. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 2008, 9, 1871-1874. • Sundvall, E.; Nystrom, M.; Petersson, H. & Ahlfeldt, H. Interactive visualization and navigation of complex terminology systems, exemplified by SNOMED CT. Studies in health technology and informatics, IOS Press; 1999, 2006, 124, 851.