1 / 19

Work Progress Report

Work Progress Report. 2011-3-11 진두현 SWRC, KAIST. Contents. Goal Previous discussions Selecting target words Extracting arguments Problems. Goal. Expanding Korean Verb Case frame. 17,753 case frames constructed ( 23 case frames per one word sense). 765. 3,834.

apria
Download Presentation

Work Progress Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Work Progress Report 2011-3-11 진두현 SWRC, KAIST

  2. Contents • Goal • Previous discussions • Selecting target words • Extracting arguments • Problems

  3. Goal Expanding Korean Verb Case frame 17,753 case frames constructed ( 23 case frames per one word sense) 765 3,834 Total: 4,599 verb sense in CoreNet

  4. Goal Expanding Korean Verb Case frame 17,753 case frames constructed ( 23 case frames per one word sense) 765 3,834 Predicate nominal Total: 4,599 verb sense in CoreNet

  5. Previous discussions • Collect more sentences(from KAIST corpus & Sejong corpus) • Need more effective way to work • Exclude subjective thought on data

  6. Statistical approach • Giving frequency can discount odd case frames

  7. Selecting target words 1. In CoreNet 2. Predicate nominal list from “현대 국어 사용 빈도 조사(2002)” (more than 10 frequency) 3. Start from most frequent word 국립 국어원 동사 목록 중 “서술성 명사” + 하다 목록 CoreNet word entry 657 predicate nominals

  8. Frequency in corpus(KAIST)

  9. Work flow 1. Selecting target words 2. Extracting sentences 3. Work 1. extracting arguments 2. assigning concepts 3. verifying case frames

  10. Extractingarguments • Chose sentences Randomly . • Using spread sheet

  11. Assigningconcepts • Automatic methods are necessary • Worker chooses word-concept suggestion • Dictionary for unknown word or proper noun Data file User’s concept-word dictionary CoreNet library

  12. Problem 1 On Making an verb entry for case frames • 1. make two ormore entry for one PN? Ex>생각하다, 생각되다 • 2. make just one entry for one PN? Ex>생각하다

  13. Three classes of Predicate Nominal 1. 생각, 발견, 관련, 제시, 포함 하다(O), 되다(O) 2. 사랑, 존재, 대답, 지적, 주장 하다(O), 되다(X) 3. 침체, 소외, 마비, 직결, 고조, 위축 하다(X), 되다(O)

  14. Substitution of arguments • Object argument with PN + ‘하’(active) moves to Subject position in PN+‘되’(passive) case. • 분석한 결과를제시하고자 하는 것은 아니다. • 그리고 이를 변량 분석한 결과가 표3 에 제시되어 있다. -- examples from KAIST corpus -- This may cause confusion

  15. General position of cases

  16. Problem 2 • Verbs which takes a sentence as a complement. 예) 생각+하다(think) Ex. 부모님의간섭은 당연하다고 생각한다. • S+고(complementizer)+ V • It’s beyond case frame but takes dominant occurrences in corpus. (158 times in 300 case frames) S

  17. Problem 2 • Guided by Sejong dictionary, but should we ignore some arg-structure types?

  18. Problem3 • Nominalized verbs • Verb(sentence) + -ㅁ/-기 • S-기 + 시작하다 - 이는 11세기경 양피지가 나오기 시작하며 쇠퇴하기 시작했다 . • This structure was found 36 times in 44 randomly chosen sentences • A verb can be an argument?

  19. Progress Extracting arguments Assigning concepts Verifying case frames

More Related