140 likes | 302 Views
Czech Verbs of Communication and the Extraction of their Frames. Václava Benešová and Ondřej Bojar. Introduction. 1. VALLEX, Valency Lexicon of Czech Verbs 2. Automatic Identification of Verbs of Communication 3. Frame Suggestion 4. Conclusion.
Czech Verbs of Communication and the Extraction of their Frames Václava Benešová and Ondřej Bojar Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Introduction • 1. VALLEX, Valency Lexicon of Czech Verbs • 2. Automatic Identification of Verbs of Communication • 3. Frame Suggestion • 4. Conclusion Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
1. Valency lexicon of Czech Verbs, VALLEX 1.x, and its Verb Classes • Verb Classes in VALLEX • Verbs of Communication Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
VALLEX Theoretical background: Functional Generative Description (FGD) Valency: “ability of lexical units to bind other lexical units” Versions: 1.0, internal 1.5, 2.0 (autumn 2006) (almost 4300 entries) Corpus coverage (Czech National corpus): ● about 10% verbs occurrences with low corpus frequency,not covered(cca 28000 lemmas) Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Verb Entry in VALLEX Verb Entry: set of valency frame(s) • Valency frame: sequence of slots (functor, morphemic realization, type of complement) • Attributes of valency frames: gloss, example, … class Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Verb Classes in VALLEX • Classification: • in progress • built from below • emphasis on syntactic criteria • communication, mental action, perception, psych verb, exchange, change, phase verbs, phase of action, modal verbs, motion, transport, location, … Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Communication verbs in VALLEX ‘a speaker conveys information to a recipient’ ACT ADDR PAT/EFF {nom} {gen/dat/acc} {dc,...} simple information: {říci: say, informovat: inform, …} + THAT: že→ verbs of announcement question: {ptát se: ask, …} + WHETHER, IF:zda, jestli→ interrogative verbs commands, bans, warning, …: {nakázat: order, zakázat: prohibit, …} + IN ORDER TO, LET:aby,ať→ imperative verbs Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
2. Automatic Identification of Verbs Communication • Evaluation VALLEX vs. FrameNet Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Automatic Identification of Verbs Communication Search corpus for V+N234+subord{aby,zda,že} marks each as a communication verb if enough occurrences are found. weak points:1.eliminates nominal structures: ‘He said the truth about the killer.’ ‘He gave her many presents.’ (verb of exchange) 2.ignoresexamples where acomplement was not expressed on the surface layer: ‘He said that …’ 3. homonymy of conjunctions: že (that) and aby (in order to) ‘He has done it in order to make money…’ Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Evaluation against VALLEX and FrameNet • golden standards: VALLEX 1.0, VALLEX 1.5, FrameNet 1.2 • ROC curves TP … true positives (communication verbs according to a golden standard and above the threshold) FP … false positives (non communication verbs and above the given threshold) TPR = TP / P (P the total number of communication verbs)… true positive rate TNR= TN / N (N the total number of verbs with no sense of communication) 40 – 50 % communication verbs identified correctly (for both VALLEX and FrameNet) 20% falsely marked Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
3. Frame Suggestion • Frame Edit Distance and Verb Entry Similarity • Experimental Results Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Frame Edit Distance and Verb Entry Similarity • FED(number of edit operations: insert, delete, replace necessary to convert a hypothesized frame to a correct frame) • ES (entry similarity or expected saving) min FED(G,H) ES=1- FED(G,Ø)+FED(H,Ø) G…golden verb entries of this base lemma H…hypothesized entries Ø…blank verb entry ES 0% (suggesting nothing), ES 100% (golden frames) Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Experimental Results with ES Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz
Conclusion • Automatic identification of communication verbs according to the proposed pattern V+N234+subord{aby,zda,že} performs satisfactorily (40-50% true positives against VALLEX and FrameNet, 20% false positives) • FED reveals that more lexicographic labour could be saved by suggesting more than one frame per verb -> need to focus on other classes, too Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz