290 likes | 665 Views
Semantic Interpretation of Medical Text. Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS. Semantic Interpretation of Medical Text . More accurate representation of the content of the input text
E N D
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS
Semantic Interpretation of Medical Text • More accurate representation of the content of the input text • Enhance text with information (concept, relationships) drawn from a medical knowledge source • Determine semantic meaning of the words (and bigger constructs) and the relationships between them.
Combine Statistical and Symbolic Methods • Use of knowledge bases, semantic hierarchies, medical knowledge, rules • Use of statistic methods and machine learning techniques
Statistical methods • Disambiguation • Detection of semantic patterns • Classification of semantically related constructs • Degrees (weights, probabilities)
First Experiment: Noun Compounds and MeSH • Interpretation of noun compounds is crucially semantic • Noun compounds extracted from a collection of titles and abstracts of medical journals found in Medline • MeSH (Medical Subject Headings) concepts for the labels
Input: Medline Text File Preprocessing Tagger Noun Compound Extraction MeSH Semantic Labeling Output: Semantic Labelled Noun Compounds
MeSH Tree Structures (main) 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]
1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] + Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] + Hemic and Immune Systems [A15] + Embryonic Structures [A16] + Body Regions [A01] Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Retroperitoneal Space[A01.047.681] Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] Pelvis [A01.673] + Perineum [A01.719] Skin [A01.835] + Thorax [A01.911] + Viscera [A01.960] MeSH Tree Structures (node A expanded)
Mapping Nouns to MeSH Concepts • Ex: migraine headache recurrence
migraine headache recurrence C10.228.140.546.800.525 C23.888.592.612.441 C23.550.291.937 blood plasma perfusion A12.207.152 A15.145.693 E05.680 migraine headache pain C10.228.140.546.800.525 C23.888.592.612.441 G11.561.796.444 brain stem neurons A08.186.211 E05.595.402.541.250 A08.663 rat liver mitochondria B02.649.865.635.560 A03.620 A11.368.702.564 plasma arginine vasopressin A15.145.693 D12.125.095.104 D06.472.734.692.781 rat thyroid cells B02.649.865.635.560 A06.407.900 A11 growth hormone secretion G07.553.481 D27.505.440.472 A12.200 blood urea nitrogen A12.207.152 D02.948 D01.362.625 breast cancer cells A01.236 C04 A11 cancer cell lines C04 A11 G05.331.599.110.708.330.800.400 More Nouns Compounds
Attachment and Semantic Interpretation • Attachment classification • “acute migraine treatment” [[N N] N] (LA) • “intra-nasal migraine treatment” [N [N N]] (RA) • To bootstrap semantic interpretation • Decision tree (Quinlan)
Levels of Descriptions • migraine headache recurrence (LA) • C10.228.140.546.800.525 C23.888.592.612.441 C23.550.291.937
Expressiveness of Decision Trees • first noun tree = B: ra (33.0/3.7) • first noun tree = E: ra (2.0/1.6) • first noun tree = F: la (0.0) • first noun tree = G: la (4.0/0.3) • first noun tree = A: • | second noun tree = B: la (0.0) • | second noun tree = D: la (4.0/0.3) • | second noun tree = E: la (10.0/0.4) • | second noun tree = F: la (0.0) • | second noun tree = G: la (6.0/1.6) • | second noun tree = A: • | | first tree position <= 4 : ra (7.0/1.6) • | | first tree position > 4 : la (36.0/5.8) • | second noun tree = C: • | | third noun tree = A: ra (9.0/0.3) • | | third noun tree = B: la (0.0) • | | third noun tree = D: la (1.0/0.3) • | | third noun tree = E: la (5.0/0.3) • | | third noun tree = F: la (0.0) • | | third noun tree = G: ra (2.0/1.6) • | | third noun tree = C: • | | | third tree position <= 21 : ra (5.0/2.6) • | | | third tree position > 21 : la (5.0/0.3) • first noun tree = C: • …..
Semantic Interpretation • Use decision tree paths for the detection of clusters of noun compounds with the same semantic interpretation
Ex: ACE:<anatomy> <disease> <Analytical, Diagnostic and Therapeutic Techniques and Equipment>
From MeSH to UMLS • Unified Medical Language System, project at U.S National Library of Medicine • 3 UMLS Knowledge Sources • Metathesaurus • Semantic Network • SPECIALIST lexicon and programs
Metathesaurus • Most extensive of UMLS sources • 730,000 concepts representing more then 1,500,000 strings in over 60 vocabularies and classifications • Organized by concept or meaning. • In essence, its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts. • Relationships in the Metathesaurus come from the sources themselves or are created by the Metathesaurus editors.
Semantic Network • Consistent categorization of all concepts represented in the UMLS Metathesaurus and the important relationships between them. • Every concept has been assigned a semantic type. • The semantic types (134) are the nodes in the Network, and the relationships between them are the links (54) • High level semantic structure
Noun Compounds, again • Very preliminary studies… • Can we use the information of the Semantic Net for the semantic interpretation on the noun compounds? • Are semantic types and relationships good descriptors? Are they useful for disambiguation and classification?
Mapping Words - Semantic Types, Semantic Relationships • Semantic types correctly assigned (on 246 nc, 738 nouns): 59% • Semantic types disambiguated by the relationships • Doesn’t disambiguate: 42.7% • Disambiguates wrong: 17.3% • Disambiguates correctly: 40%
(Some of) Future Work • Explore in more depth UMLS sources • What form the best basis for automatic semantic interpretation of noun phrases? • Semantic types? • Metathesaurus concepts?(and what parts of them) • Just MeSH concepts? • Machine Learning algorithms to help choose a good representation of medical terms
Future Work • Machine learning algorithms for classification • Can we (and how) generalize patterns found for noun compounds to other syntactic structures? • How can we best formally represent semantics? • How can we combine symbolic rules with statistical methods? • How can we deal with non medical words? • Can the system help us disambiguate them? • Should we use other ontologies (ex WordNet)?