Word Sense Disambiguation

Word Sense Disambiguation 2000. 3. 24. 자연언어 처리 특강

Contents • Introduction and preliminaries • Supervised Learning • Bayesian Classification • Information Theoretic Approach • Dictionary Based Disambiguation • Disambiguation based on sense definitions • Thesaurus-based Disambiguation • Disambiguation based on translations in a second-language corpus • One Sense/Discourse,One Sense/Collocation • Unsupervised Learning

Introduction • Word Sense disambiguation • Word sense ambiguity • ‘Bank’ : 둑, 은행 • ‘Title’ : 분야에 따라 다른 의미 • 표제, 직함, 권리, 금의 순도, 선수권 … • In gallery : ‘This work doesn’t have a title’ • ‘butter’ : 품사에 따른 의미 차이 • Semantic Tagging

Preliminaries • Supervised vs. Unsupervised learning • Supervised : classification • Unsupervised : clustering • Pseudowords • Large training/test collection 획득 • ‘banana-door’ : corpus의 banana와 door에 대한 ambiguity를 가정 • Upper and lower bounds • Upper bound : Human power. • Gale et al.’s work : 쌍으로 주어진 문제들에 대해 같은 의미를 갖는지 판단하도록 함 (97%~99% 정확률) • Lower bound : 많이 쓰이는 의미로 고정했을 때

Supervised Learning • Two Approach • Bayesian Classification • Context window 내의 단어들을 source로 판단 • Structure를 고려하지 않음 • Information-theoretic approach • Context내의 한가지 information feature(indicator)를 통해 sense 결정

Bayesian Classification • Bayes’s decision rule • Baye’s rule

Bag of words • Navie Bayes assumptions • context window ‘c’에 대해서 • Use MLE • P(vj|sk)=C(vj ,sk)/C(sk) • P(sk) = C(sk)/C(w) • sense s’에 대해 (p.238 Fig 7.1)

Gale, Church and Yarowsky(1992) • Hansard corpus • duty, drug, land, language,position, sentence • 90%의 정확도

Information-theoretic approach • Brown et al.’s (1991) work • 불영 번역 시스템에 사용 • I(P; Q)를 최대화 하는 Indicator를 사용 • P: 대역어 집합, Q : indicator value 집합 • Mutual information

Algorithm • Maximize I(P; Q) • 모든 가능한 indicator에 대해 계산 • I(P;Q)가 가장 커지는 indicator와 Q의 partition set을 구함 • Flip-Flop algorithm(p. 240, Fig 7.2) • Find random partition P={P1,P2} of {T1…Tm} • While (improving) do • Find partition Q={Q1,Q2} of {X1…Xn} maximizes I(P;Q) • Find partition P={P1,P2} of {t1…tm} maximizes I(P;Q) • End • (T1…Tm : tranlation word, X1…Xn : indicator’s possible value)

Dictionary-Based Disambiguation • 단어의 의미분류에 대한 정보가 없을 때 • 세가지 접근 방법 • 사전의 의미정보 만을 사용 (Lesk, 1986) • 시소러스 정보 사용 (Yarowsky, 1992) • Bilingual dictionary와 이언어 corpus 사용(Dagan and Itai,1994)

Disambiguation based on sense definitions • 사전의 정의를 사용 • D1…Dk에 대해,s1…sk의 의미를 설정 • Algorithm(p.243, Fig 7.3) • Accuracy : 50% ~ 70% • comment: Given context c • for all senses sk of w do • score(sk) = overlap(Dk, Evj) • end • s’=argmax score(sk) • *.Evj : context에 있는 사전 정의문의 단어들

Example • word ‘ash’ • 사전정의 • scoring

Thesaurus-based Disambiguation • 시소러스의 의미 분류 정보를 사용 • Walker’s algorithm (1987) (p.245, Fig. 7.4) • Yarowsky’s algorithm • Baye’s classifier 사용 • context 의 category를 구하고, 그것을 이용해 단어의 catetgory를 구해 의미를 결정한다 comment: given context c for all senses sk of w do score(sk) =  vj in c (t(sk),vj) end s’ = arg max score(sk) *. (t(sk),vj) = 1 , iff t(sk)가 vj의 subject code에 포함될 때 = 0, 그 밖의 경우

Yarowsk’s algorithm • context 의 score 계산 (p.246, Fig 7.5) • Navie Bayes assumption • score(ci,tl) = P(tl|ci) • sense s’에대해,

Some Results • Roget categories

Disambiguation based on translations in a second-language corpus • Dagan and Itai(1994) • 번역어의 분포에 따라 의미 결정 • Algorithm(p.249, Fig 7.6) • 공기어의 대역어에 대한 코퍼스의 분포로 의미 결정 • comment: Given : a context c in which w occurs in relation R(w,v) • for all senses sk of w do • score (sk)= |{cS | w’ T(sk), v’ T(v): R(w’,v’) c}| • end • s’ =arg max score(sk) • *. S : second language corpus • *. T(x) : possible translation of x

Example • ‘interest’ • ‘show interest’ : show  zeigen • zeigen은 interesse와 붙어 나오게 됨 • sense2 선택

One Sense per Discourse,One Sense per Collocation • One sense per discourse • 한 문서 내에서 단어는 한가지 sense를 갖게 될 확률이 높다 • One sense per collocation • 가까이 있는 단어는 목적 단어의 sense의 힌트가 되기 쉽다 • collocation 정보를 이용해 단어의 sense 결정 (collocation word f : )

Unsupervised Disambiguation • Completely unsupervised disambiguation • sense tagging은 불가능 • context-group 판별 • clustering 을 통해 grouping • Gale et al.’s Baye’s classifier와 유사한 확률 모델 • 정해진 K에 대하여 s1… sK의 group(sense) 가정 • P(sk|c) 값 계산 • EM algorithm (p.254 Fig 7.8)으로 확률값 계산

Unsupervised Disambiguation (cont.) • K 값의 결정 • K값이 커지면 sense 구분이 세밀해 짐  많은 training corpus 필요 • corpus 양에 따라 결정 • 사전의 참조나, tagging 된 corpus없이 sense 차이를 구분 할 수 있다. • 정보검색에 유용

Word Sense • Word Sense 란? • 의미의 차이에 대한 정신의 표현 • sense 를 정하는 기준 : 정신의 올바른 표현인가? • Systematic Polysemy • Co-activation (p.258 7.9, 7.10) • ‘the act of X’ and ‘the people doing X’ • Organization, administration, formation … • Proper nouns : Brown, Bush, Army … • Application

Word Sense Disambiguation