1 / 23

Codifying Semantic Information in Medical Questions Using Lexical Sources

Codifying Semantic Information in Medical Questions Using Lexical Sources. Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu. Research Purpose. To find a method for classifying medical questions that are asked by clinicians Hypothesis - Simply indexing by keywords isn’t enough to

Download Presentation

Codifying Semantic Information in Medical Questions Using Lexical Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu

  2. Research Purpose • To find a method for classifying medical questions that are asked by clinicians • Hypothesis - Simply indexing by keywords isn’t enough to • distinguish questions with different meanings but similar wording, or to • group questions with similar meanings but different words.

  3. Definitions • Semantic Information – the meaning of the words • Syntactic Information – the parts of speech of the words (word type, sentence part) • Medical Questions – a question asked by a clinician • Lexical Sources – sources of words and vocabularies • UMLS – Unified Medical Language System

  4. UMLS • Ambitious project of the National Library of Medicine, begun in 1986 • Help researchers retrieve and integrate electronic biomedical information from a variety of sources • Links over 100 controlled vocabularies • Assigns unique identifiers to medical concepts and strings • Maps the hierarchical relationships between the medical concepts

  5. Why Bother?(To classify medical questions?) • Clinicians have questions when treating patients • Researchers have gathered collections of these questions • No good method exists to classify the questions • How many times has a particular question been asked? • Which questions should receive priority for evidence-based answers?

  6. Examples • What is the best way to treat acute pharyngitis? • How should I approach a patient with a sore throat? • What should I do with a patient with diabetes and insulin resistance? • What should I do with a patient with diabetes who is resistant to taking insulin?

  7. MethodsSource Questions • American researcher – observed clinicians at work • British researchers – questions sent in by clinicians – answered by researchers • Australian researchers – questions sent in by clinicians – answered by researchers • 4083 total questions

  8. Methods Source Vocabulary • MRCON – a table from the Metathesaurus • Lists the medical concepts by unique identifiers (CUI) and each string associated with a concept • unique (string => 1 concept) • ambiguous (string => 2+ concepts) • COLD – ambient temperature, viral respiratory infection, chronic obstructive lung disease • 2,247,454 strings associated with concepts • Non-medical Lexicon – from Roget’s Thesaurus • Query objects (why, when, how), identifiers (I, you, he), modifiers (soon, frequently) • 749 terms in this lexicon

  9. String Matching • Parsing program (written in C) • Separates individual questions into 3-word, 2-word, 1-word windows • Matches the window against MRCON and our lexicon • Generates a report of: • Total number of words parsed • Number of matches from unique, ambiguous, non-medical lists • Strings that didn’t match any of the lists

  10. Results • String – individual word or words that matched • Hits – how often the string was found • Words – total number of matching words (some strings have more than one word in them)

  11. Results • 100 strings occurred 7850 times – or 57.6% of the total matches • 712 strings => 3+ hits, 85% of all hits • Our focus was on strings that didn’t match one of the source vocabularies • 19.1% didn’t match • Hypothesis that additional terms not found in MRCON will be important for indexing

  12. Results • Unmatched words – 2+ occurrences * can be more than one word type, depending on the context. Attacks, step, process all can be nouns or verbs

  13. Discussion • MRCON – selected because of low rate of ambiguous string-CUI combinations • 89% unique string matches • 11% ambiguous string matches • Other tables have greater word coverage, but have more ambiguity for each of the words

  14. Discussion • Our word-matching results were similar to other researchers • Cimino matched 43% of words with Meta-1 (we had 56% MRCON matches) • Computers & Biomedical Research. Aug 1992;25(4):366-373. • Hersh matched 60% of words to medical terminology & names dictionary (we had 79% combined lexicon matches) • Proceedings/AMIA Annual Fall Symposium. p. 1997.

  15. Discussion • Stop words – commonly removed by most normalization tools. Prepositions, conjunctions, pronouns • Provide valuable contextual information. • Blood FOR an HIV-positive patient • Blood FROM an HIV-positive patient • Asprin AND warfarin • Asprin OR warfarin

  16. Discussion • Integers • 186 distinct integers or integer word combinations • Occurred 647 times • Additional modification of concepts • Hyperkalemia – 5.3 mEq/li & 8.7 mEq/li • Both are hyperkalemia, but the evaluation and management are markedly different

  17. Discussion • Verbs – largest category of unmatched words • Include action and relation concepts • Non-medical lexicon contained some • Treats, attends, increases, lessens, reduce, follows, starts, can, should, is, equal, improve • Verb tense changes the meaning of a question • In a patient TAKING antibiotics • In a patient who TOOK antibiotics

  18. Discussion • Verbs may be conceptually related to medical concepts • Diagnose => Diagnosis • Treat => Treatment • Evaluate => Evaluation • Prescribe => Prescription • In these cases the verb (relationship) is not equivalent to the noun (concept)

  19. Summary • We developed an application to • Parse individual words from collections of medical questions • Match the words (phrases) with lexical sources, codified by the UMLS • Our results were better than previous investigators (for percentage of matched words) • We still have some work to do….

  20. Related Experiments • We attempted to cluster questions by sequences of semantic types • Initial attempts mostly clustered common phrases such as “How should I” and “What is the” • We may repeat this method after discarding ‘stop phrases’

  21. Future Work • Family Practice Inquiries Network (FPIN) has 200 questions that have associated MeSH terms manually assigned by librarians. • We will look at these question-term groups for clustering purposes (with the hypothesis that they will not make distinct clusters).

  22. Future Work I will work with researchers at NLM to apply MetaMap to medical questions • extract triplets (Medical Concept-Allowable Relation-Medical Concept) from questions. Drug-treats-Disease • Insert the triplets into a vector-space model and look for clusters

  23. Thank-you!! ???

More Related