210 likes | 512 Views
Work Progress Report. 2011-3-11 진두현 SWRC, KAIST. Contents. Goal Previous discussions Selecting target words Extracting arguments Problems. Goal. Expanding Korean Verb Case frame. 17,753 case frames constructed ( 23 case frames per one word sense). 765. 3,834.
E N D
Work Progress Report 2011-3-11 진두현 SWRC, KAIST
Contents • Goal • Previous discussions • Selecting target words • Extracting arguments • Problems
Goal Expanding Korean Verb Case frame 17,753 case frames constructed ( 23 case frames per one word sense) 765 3,834 Total: 4,599 verb sense in CoreNet
Goal Expanding Korean Verb Case frame 17,753 case frames constructed ( 23 case frames per one word sense) 765 3,834 Predicate nominal Total: 4,599 verb sense in CoreNet
Previous discussions • Collect more sentences(from KAIST corpus & Sejong corpus) • Need more effective way to work • Exclude subjective thought on data
Statistical approach • Giving frequency can discount odd case frames
Selecting target words 1. In CoreNet 2. Predicate nominal list from “현대 국어 사용 빈도 조사(2002)” (more than 10 frequency) 3. Start from most frequent word 국립 국어원 동사 목록 중 “서술성 명사” + 하다 목록 CoreNet word entry 657 predicate nominals
Work flow 1. Selecting target words 2. Extracting sentences 3. Work 1. extracting arguments 2. assigning concepts 3. verifying case frames
Extractingarguments • Chose sentences Randomly . • Using spread sheet
Assigningconcepts • Automatic methods are necessary • Worker chooses word-concept suggestion • Dictionary for unknown word or proper noun Data file User’s concept-word dictionary CoreNet library
Problem 1 On Making an verb entry for case frames • 1. make two ormore entry for one PN? Ex>생각하다, 생각되다 • 2. make just one entry for one PN? Ex>생각하다
Three classes of Predicate Nominal 1. 생각, 발견, 관련, 제시, 포함 하다(O), 되다(O) 2. 사랑, 존재, 대답, 지적, 주장 하다(O), 되다(X) 3. 침체, 소외, 마비, 직결, 고조, 위축 하다(X), 되다(O)
Substitution of arguments • Object argument with PN + ‘하’(active) moves to Subject position in PN+‘되’(passive) case. • 분석한 결과를제시하고자 하는 것은 아니다. • 그리고 이를 변량 분석한 결과가 표3 에 제시되어 있다. -- examples from KAIST corpus -- This may cause confusion
Problem 2 • Verbs which takes a sentence as a complement. 예) 생각+하다(think) Ex. 부모님의간섭은 당연하다고 생각한다. • S+고(complementizer)+ V • It’s beyond case frame but takes dominant occurrences in corpus. (158 times in 300 case frames) S
Problem 2 • Guided by Sejong dictionary, but should we ignore some arg-structure types?
Problem3 • Nominalized verbs • Verb(sentence) + -ㅁ/-기 • S-기 + 시작하다 - 이는 11세기경 양피지가 나오기 시작하며 쇠퇴하기 시작했다 . • This structure was found 36 times in 44 randomly chosen sentences • A verb can be an argument?
Progress Extracting arguments Assigning concepts Verifying case frames