150 likes | 173 Views
Explore a comprehensive study on identifying semantic relations in technical text using computational methods. Delve into linguistic analysis and classification to enhance understanding of complex noun compounds. Discover state-of-the-art results in this innovative field of research.
E N D
Noun compounds (NCs) • Any sequence of nouns that itself functions as a noun • asthma hospitalizations • asthma hospitalization rates • health care personnel hand wash • Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.
NCs: 3 computational tasks • Identification • Syntactic analysis (attachments) • [Baseline [headache frequency]] • [[Tension headache] patient] • Semantic analysis • Headache treatment treatment for headache • Corticosteroid treatment treatment that uses corticosteroid
Two approaches • Treat it as a classification problem (and use a machine learning algorithm) • Linguistically motivated: consider the “semantics” of the nouns which will determine the relations between them
First approach • Extraction of NCs from titles and abstracts of Medline • Part-of-Speech Tagger • Extraction of sequences of units tagged as nouns • Collection of 2245 NCs with 2 nouns • A manual annotation of the NCs found 38 semantic relations • Collection of labeled NCs and a set of semantic relations
Semantic relations • Frequency/time of • influenza season, headache interval • Measure of • relief rate, asthma mortality, hospital survival • Instrument • aciclovir therapy, laser irradiation, aerosol treatment • “Purpose” • headache drugs, hiv medications, influenza treatment • Defect • hormone deficiency, csf fistulas, gene mutation • Inhibitor • Adrenoreceptor blockers, influenza prevention
Semantic relations • Cause • Asthma hospitalization, aids death • Change • Papilloma growth, disease development • Activity/Physical Process • Bile delivery, virus reproduction • Person Afflicted • Aids patients, headache group • ….
Features • Lexical (words) • MeSH descriptors
Classification method and results • Multi-class (18) classification problem • Multi layer Neural Networks to classify across all relations simultaneously. • Results
Second approach • Linguistic Motivation • Head noun has argument structure • Meaning of the head noun determines what kinds of things can be done to it, what it is made of, what it is a part of…
Linguistic Motivation • Material + Cutlery Made of • steel knife, plastic fork, wooden spoon • Food + Cutlery Used on • meat knife, dessert spoon, salad fork • Profession + Cutlery Used by • chef's knife, butcher's knife
Linguistic Motivation • Hypothesis: • A particular semantic relation holds between all 2-word NCs that can be categorized by a MeSH pair. • Use the classes of MeSH to identify semantic relations
Grouping the NCs • A02 C04 (Musculoskeletal System, Neoplasms) • skull tumors, bone cysts, bone metastases, skull osteosarcoma… • B06 B06 (Plants, Plants) • eucalyptus trees, apple fruits, rice grains, potato plants • A01 M01 (Body region, Person) • shoulder patient, eye physician, eye donor • Too different: need to be more specific: go down the hierarchy • A01 M01.643 (Body Regions, Patients) • shoulder patient • C04 M01.526 (Body Regions, Occupational Groups) • eye physician, chest physicians
Classification Decisions + Relations • A02 C04 Location of Disease • B06 B06 Kind of Plants • C04 M01 • C04 M01.643 Person afflicted by Disease • C04 M01.526 Person who treats Disease • A01 H01 • A01 H01.770 • A01 H01.671 • A01 H01.671.538 • A01 H01.671.868 • A01 M01 • A01 M01.643 Person afflicted by Disease • A01 M01.526 Specialist of • A01 M01.898 Donor of
Evaluation • Accuracy: • Anatomy: 91% accurate • Natural Science: 79% • Neoplasm: 100% • Total Accuracy : 90.8%
Conclusion of NCs • Problem of assigning semantic relations to two-word technical NCs • Important problem: many NCs in technical text • Especially difficult for the lack of syntactic clues • State-of-the-art results • One of very few working systems to tackle this task for NCs