120 likes | 247 Views
Weekly Report. Semantic Web Research Center Duhyeon Jin 2011-4-15. Contents. Goal Last discussion Problems & resolutions Achievements Plan. Goal. To construct case frames for 675 “PN+’ 하 ’” verbs Currently, automatic construction from Sejong Treebank. 43,828 trees.
E N D
Weekly Report Semantic Web Research Center Duhyeon Jin 2011-4-15
Contents • Goal • Last discussion • Problems & resolutions • Achievements • Plan
Goal • To construct case frames for 675 “PN+’하’” verbs • Currently, automatic construction from Sejong Treebank 43,828 trees Total 9,598 treesfor 675 verbs ?,??? Case frames
Last discussion • Problem • Extracted only 2,701 caseframe instances from 9,598 parse trees many arguments are missing. • Understanding about a tree structure of Sejong Treebank. • Grammatical characteristic of Korean that applied to Sejong Treebank.
Resolutions • Problem1 • I missed dative & locative arguments(N3) from Sejong Treebank, only considered subject(N1) and object(N2) • Resolution • In Sejong Treebank the case of “으로, 에, 에게, 에서..” isregarded as AJT(adjunct) • Let algorithm extract “X_AJT” tag
Resolutions • Problem 2 • Auxiliary verb blocks algorithm to get arguments that could be originally the complement of target verb. • Ex> ~수 있, • Resolution • Allow the verb beside “수/NNB” to be a head word in the tree VP_MOD VP_MOD
Resolutions • Problem 3 • Missing huge number of relative clause • Ex> 만두를 먹은 철수가, 철수가 먹은 만두가, 철수가 살던 곳이.. 만두를 먹은 철수를, 철수가 먹은 만두를… • Resolution • Modify algorithm to extract NP which is modified by VP_MOD or S_MOD
Achievement • Extracted 5,354 case frames more • All workflow is described on the web: • http://sysx2.kaist.ac.kr/wiki/index.php/세종_구문_분석_코퍼스_논항추출_작업 • Not well-fomed tree and no-argument case frames are in the rest 2,137 trees. Extracted case frames 7,461 2,107 7,491 2,137
Refining Extracted arguments • Currently doing Refining extracted data • Extracted Arguments still has problems • Problem1. unnecessarily extracted Adjunct argument, (Ex>동작을본능적으로표현하..) • Problem 2. modifiedNPs that cannot be a argument (Ex> 문제를해결할 힘) ( power can not resolve the problem. ) • Resolutions • 1. Compare arguments to Sejong Dictionary • 2. Manual Checking for 1,134 case frames that extracted from relative clauses.
Plan • After refine extracted data,(~4.15) • Assigning concepts to each arguments using dictionary and CoreNet (4.18 ~ 22) • Refining and Evaluating All extracted case frames(4.25 ~ 29)