Knowledge Management for Disease Coding (KMDC): Background & Introduction

Knowledge Managementfor Disease Coding (KMDC):Background & Introduction Timothy Hays, Ph.D. Project Manager, Knowledge Management for Disease Coding Office of Extramural Research (OER) Office of Portfolio Analysis and Strategic Initiatives (OPASI)

Overview of Today’s Presentation • Why is NIH Pursuing Knowledge Management for Disease Coding (KMDC)? • How Does KMDC Work Conceptually • What Insights Were Gained? • Next Steps

Why is NIH Pursuing Knowledge Management for Disease Coding?

External Drivers • The public and the Congress have a right to know how NIH money is spent • Efforts to find out how money is spent often begin with questions about the amount of money devoted to specific diseases or research topics • Currently, all 27 institutes and centers apply different definitions, methods and business rules when coding diseases & topics and determining the dollar amounts

Why is NIH Pursuing KMDC? Two recent National Academy of Science reports that recommended that NIH should improve data on funding by disease: • " ... the Committee concludes that the current lack of an information management method and infrastructure to collect, analyze, and report investment data in a timely fashion must be addressed…”

How Does KM for Disease Coding Work Conceptually?

New KMDC Coding Process … KMDC System mines the document for relevant concepts using the electronic KMDC thesaurus … Coding source is the grant document ... + … Automated process associates the grant concepts with the disease categories ... + = NIH Disease Category Reporting Biggest Challenge: How to define the Disease Categories in the new system?

Document Fingerprint Creation Source Document (Grant/Project) Text Mining Document Fingerprint Title, abstract and specific aims Thesaurus - Fingerprint is a list of concepts such as fever, fatigue, breast neoplasm, etc. Each concept is assigned a relative weight based on frequency. - Fingerprint is a small but unique representation of the source document. Thesaurus: Specialized vocabulary of a particular domain including NLM’s MeSH (Medical Subject Headings) thesaurus, CRISP thesaurus, NCI’s thesaurus, Metathesaurus, plus the addition of various concepts (acquired from various ICs, ICD-10, and the disease category fingerprint process).

Disease Category Fingerprint Creationin the New System In the new system a defined Disease Category Definition is called a Fingerprint. • A Fingerprint is a list of concepts from the thesaurus. • Concepts are selected by NIH Scientific Experts to define that disease category. • Concepts can be weighted to fine-tune the system. • The Disease Category fingerprints are matched to the grant concepts to produce disease reporting.

Disease Category Fingerprints to be determined by NIH Projects with matching disease categories Matching Process Project Fingerprints Matching compares individual project fingerprints to the disease category fingerprints – the degree of ‘match’ results is the matching score, which is a function of how closely related they are. Comparing a Document’s Fingerprint to the Disease Category’s Fingerprint 4

What Insights Were Gained… …a work in progress

What Insights Were Gained? • Clear direction and support from the NIH Director and Senior Leadership has been crucial • Sufficient resources • Build cross-agency teams to address key issues • Be open to feedback: customer service focus • Careful attention to build a process that capitalizes on what NIH experts do best & minimize “time burden” • Allow time for business process change • Keep the train moving…

Next Steps • Define taxonomy/thesaurus • Define disease/reporting category definitions through use of the thesaurus • Develop customer-friendly interface to access and operate the tool • Use clear communication • Be transparent

Thank You

NATIONAL LIBRARY OF MEDICINE (NLM) SBIR-STTR The NLM supports research and development projects in biomedical informatics and bioinformatics. NLM defines biomedical informatics as the intersection of basic informational and computing sciences with a wide range of application domains in biomedicine and public health. Bioinformatics is the intersection of basic informational and computing sciences with the biological sciences. For additional information about areas of interest to the NLM, please visit our home page at http://www.nlm.nih.gov/ep. For additional information on research topics, contact: Hua-Chuan Sim, M.D. Program Officer Division of Extramural Programs National Library of Medicine (301) 496-4253, Fax: (301) 402-2952 Email: simh@mail.nih.gov For administrative and business management questions, contact: Mr. Dwight Mowery Grants Management Officer Extramural Programs Division National Library of Medicine (301) 496-4221, Fax: (301) 402-0421

NATIONAL LIBRARY OF MEDICINE (NLM) SBIR-STTR Biomedical Informatics There are broad needs for informatics concepts, tools and systems to manage the information of health care delivery, reduce medical errors, provide decision support for clinicians, extract outcome and public health information from large datasets, and predict health events. To support such projects, NLM is interested in: A. Mechanisms to capture and integrate new information into existing knowledge bases, including integration of heterogeneous data sets. B. Approaches for retrieving, extracting and analyzing data from large health-related and heterogeneous databases, such as patient data, population health data, or image databases. C. Systems, devices, or programs that facilitate utilization of electronic medical record systems in clinical practice, for such functions as chart entry, decision support, reduction of errors. D. Systems, devices, or programs that facilitate utilization of electronic record systems and tools in public health, for such functions as syndromic surveillance and trend analysis. E. Projects relevant to the informatics of disaster management, including management and delivery of information in disaster settings.

NATIONAL LIBRARY OF MEDICINE (NLM) SBIR-STTR Bioinformatics High through-put scientific research has greatly increased the volume of research data and has magnified the problem of information management and interpretation. To help manage such data, NLM is interested in: A. Software algorithms and database query methods capable of capturing scientific data from published knowledge sources or multiple related factual databases. B. Tools for data management and analysis for genetic linkage mapping, physical mapping, DNA sequencing, and proteomics. C. Tools and systems for bringing "bench to bedside," applying research data to clinical problems. D. Algorithms capable of predicting structure and/or function in model biological systems. E. Algorithms capable of enhanced computational modeling of biological, biomedical and behavioral sciences data at multiple scales of research, ranging from molecular to population.

Knowledge Management for Disease Coding (KMDC): Background & Introduction