280 likes | 297 Views
This study explores the systematic assignment of domain labels to WordNet glosses for semantic enrichment. The proposed method corrects and verifies labeling using WN.Domains, providing semantically enriched data. Evaluation shows effectiveness in different syntactic categories, enhancing semantic processing.
E N D
Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004
Outline • Introduction • WordNet • WN Domains • Experimentation • Evaluation and results • Discussion • Conclusions
Introduction • To semantically enrich any WN version with the semantic domain labels of MultiWordNet Domains • WN is an standard resource for semantic processing • Effectiveness of Word Domain Disambiguation • The work presented explores the automatic and sistematic assignment of domain labels to glosses • Proposed Method can be used to correct and verify the suggested labeling
WordNet • The version WN1.6 was used because of the availability of WN Domains
pure_science mathematics geometry statistics biology botany zoology entomology anatomy ... ... ... WN Domains TOP WordNet Domain hierarchy developed at IRST (Magnini and Cavagliá, 2000)
WN Domains • The synsets have been annotated semiautomatically with one or more labels • Most of synsets it has single a label Distribution of domain labels for synset noun = 1.170 verb = 1.078 adj = 1.076 adv = 1.033 Average labels for synset
WN Domains • A domain may include synsets of different syntactic categories : e.g. MEDICINE • doctor#1 (n) • operar#7 (v) • medical#1 (a) • clinically#1 (r) • A domain label may also contain senses from different Wn subhierarchies. e.g. SPORT • athleta#1 life-form#1 • game-equipment#1 physical-object#1 • sport#1 act#2 • playing-field#1 location#1
WN Domains • Synsets that have more than one label, do not seem to follow any pattern • sultana#n#1 (pale yellow seedless grape used for raisins and wine) Botany Gastronomy • morocco#n#2 (a soft pebble-grained leather made from goatskin; used for shoes and book bindings etc.) Anatomy Zoology • canicola_fever#n#1(an acute feverish disease in people and in dogs marked by gastroenteritis and mild jaundice) Medicine Physiology Zoology • blue#n#1, blueness#n#1 (the color of the clear sky in the daytime; "he had eyes of bright blue") Color Quality
Applications of WN Domains • Word Sense Disambiguation • Word Domain Disambiguation • Text Categorization, etc. WN Domains • FACTOTUM : Used to mark the senses of WN that do not have a specific domain • STOP Senses: The synsets that appear frequently in different contexts, for instance: numbers, colours, etc.
Experimentation • Process to automatically assign domain labels to WN1.6 glosses • Validation procedures of the consistency of the domains assignment in WN1.6, and especially, the automatic assignment of the factotum labels Distribution of synset with and without the domain label factotum in WN1.6
Experimentación Test set was randomly selected (around 1%) and the other synsets were used as a training set Corpus test for nouns and verbs
castle chess 68 castle sport 27 castle hystory 18 castle archictecture 57 castle law 12 castle tourism 24 … Experimentation castle#n#4, castling#n#1 CHESS SPORT castle castling | interchanging the positions of the king and a rook castle chess castle sport castling chess castling sport interchanging chess interchanging sport interchanging chess interchanging sport interchanging chess interchanging sport king chess king sport rook chess rook sport Calculation of frequency
c(w,D) - 1/N*c(w)c(D) c(w,D) Experimentation Measures M1: Square root formula M2: Association Ratio Ar(w,D) = Pr(w|D)log2(Pr(w|D) / Pr(w)) M3: Logarithm formula log2(N*c(w,D) / c(w)c(D))
orange botany 10.1739451057135 orange gastronomy 4.98225066954225 orange color 3.28232334801756 orange jewellery 1.49369255002054 orange entomology 1.23243498322359 orange quality 1.17822271128967 orange hunting 0.412524764820793 orange geology 0.293707167933641 orange chemistry 0.166183492890361 orange biology 0.110492358490017 Experimentation TRAINING MATRIX OF WEIGHTS CALCULATION VALIDATION
06950891 leader#n#1 PERSON law 2.70 factotum 2.09 computer_science 2.05 mathematics 1.83 grammar 1.68 play 1.57 linguistics 1.54 politics 1.35 person 19.94 law 8.01 economy 4.74 religion 4.24 anthropology 3.74 sexuality 3.53 politics 3.49 tourism 1.64 industry 1.54 person 1.46 mechanics 1.26 factotum 1.24 occultism 0.98 pedagogy 0.93 politics 4.30 history 3.33 religion 2.19 person 1.78 mythology 1.17 commerce 1.11 psychology 0.96 factotum 0.82 leader | a person who rules or guides or inspires others variant gloss person Experimentation POSITION 1: person = 30.23 POSITION 2: politics = 13.40 POSITION 3: law = 11.08 ... ... VD = weigth(wi,dj)*percentage
Evaluation y Results: nouns AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy). Results for nouns without factotum SF Results for nouns with factotum CF
Evaluation y Results: verbs AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy). Results for verbs without factotum SF Results for verbs with factotum CF
Evaluation y Results • On average, the method assigns: • Noun : 1.23 domains labels (1.170) • Verb : 1.20 domains labels (1.078) • We obtain better results with nouns • The best average results were obtained with the M1 measure • The first proposed label (noun): 70% accuracy • The results of verbs are worse than nouns, one of the reasons may be the high number of verbal synsets labels with factotum domain
Discussion Monosemic words: credit application#n#1 (an application for a line of credit) Domains: SCHOOL Proposal 1. Banking Proposal 2. Economy Banking economy banking
Discussion Relation between labels: Academic_program#n#1 (a program of education in liberal arts and sciences (usually in preparation for higher education)) Domains: PEDAGOGY Proposal 1. School Proposal 2. University pedagogy school university
Discussion Relation between labels: shopping#n#1 (searching for or buying goods or services: "went shopping for a reliable plumber"; "does her shopping at the mall rather than down town") Domains: ECONOMY Proposal 1. Commerce social_science commerce economy
Discussion Relation between labels: Fire_control_radar#n#1 (radar that controls the delivery of fire on a military target) Domains: MERCHANT_NAVY Proposal 1. Military social_science transport military merchant_navy
Discussion Uncertain cases: birthmark#n#1 (a blemish on the skin formed before birth) Domains: QUALITY Proposal 1. Medicine bardolatry#n#1 (idolization of William Shakespeare) Domains: RELIGION Proposal 1. History Proposal 1. Literature
Conclusions • The procedure to assign automatically domain labels to WN gloss seems to be dificult • The proposal process is very reliable with the first proposal labels • The proposal labels are ordered by priority • It is posible to add new correct labels or validate the old ones
Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004
Discussion Relations WN: bowling#n#2 (a game in which balls are rolled at an object or group of objects with the aim of knocking them over) Domains: BOWLING Proposal 1. Play play sport hol play#n#16 game#n#2 play free_time hyp sport bowling#n#2 bowling
WN Domains • Example (B. Magnini et. Al., 2001)
WN Domains N SF DOMAINS SUMO TOP ONTOLOGY #1 Group Economy Corporation Function Group Human #2 Object Geography Geology Land-area Natural Place Substance #3 Possession Economy Keeping Function Moneyrepresentation Part #4 Artifact Architecture Economy Building Artifact Function Object #5 Group Factotum Collection Group #6 Artifact Economy Artifact Artifact Container Instrument Object #7 Object Geography Geology Land-area Natural Place Solid Substance #8 Possession Economy Play Currency-measure Function #9 Object Architecture Land-area Natural Place Substance #10 Act Transport Motion Agentive Boundedevent Cause Condition Dynamic Purpose