240 likes | 395 Views
Issue s of Valency in Prague Dependency Treebank: C reating Valency Lexicon of Verbs. Markéta Lopatková Center for Computational Linguistics MFF UK, Prague. CIL XVII, Prague, July 26, 2003 1. Motivation. ‘traditional’ linguistics source of data for linguistic research
E N D
Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague CIL XVII, Prague, July 26, 2003 1
Motivation • ‘traditional’ linguistics • source of data for linguistic research • verification of theoretical criteria set up • natural language processing • lemmatization • morphological tagging • syntactic analysis • word sense disambiguation • ‘semantic analysis’ • machine translation • building other resources • language acquisition CIL XVII, Prague, July 26, 2003 2
Syntactic vs.semantic approach I. • ‘Levin Verb Classes’ (Levin, 1993) • hypothesis: syntactic features of verbs are semantically determined • method: syntactic behavior semantic classes • ‘alternation’ ~ a change in the realization of the argument structure of a verb • ‘conative alternation’ Edith cuts the bread Edith cuts at the bread • classes = verbs which undergo certain types of alternations CIL XVII, Prague, July 26, 2003 3
Syntactic vs.semantic approach II. • PropBank (Palmer et al., 2001) • ‘layer of semantic annotation’ in PennTreebank • argument structure for verbs arguments: Arg0, ... Arg5 modificators: ArgM (LOC, TMP, EXT, PRP, ADV) • He was drawing diagrams and sketches for his patron. Arg0: he Rel: drawing Arg1: diagrams and sketches Arg2-for: his patron • He keeps st in the fridge. Arg0: he Rel: keeps Arg1: st Arg2-in: the fridge (also Hajičová, Kučerová, 2002) CIL XVII, Prague, July 26, 2003 4
Syntactic vs.semantic approach III. • FrameNet (Fillmore, 2002) • it groups lexical items with parallel semantic characterization • the structure and particular components correspond to‘semantic roles’ of the common semantic frame • verbs, nouns, adjectives, prepositions • ‘Communication’: ‘Speaker’ ‘Message’ ‘Addressee’ ‘Topic’ ‘Medium’ Tom communicates with Kim about the festival. Tom communicates with Kim by letter. Tom communicates the message to me. • ‘Reciprocality’: ‘Protagonists’ ‘Prot-1’ ‘Prot-2’ Tom fought with Kim. Tom and Kim fought. CIL XVII, Prague, July 26, 2003 5
Syntactic vs.semantic approach IV. • LCS Database (Lexical Conceptual Structure) (Dorr, 2001) • semantic representation • semantic structure + semantic content • verb cut down lexical item: (act_on loc (* thing 1) (* thing 2) ((* [on] 23) loc (*head*) (thing 24)) (cut+ingly 26) (down+/m)) • sentence United States cut down (the) quota. (act_on loc (us+) (quota+) ((* [on] 23) loc (*head*) (thing 24)) (cut+ingly 26) (down+/m)) • logic arguments (ag, exp, th, src, goal, info, perc, loc,poss, time, prop) • logic modifiers (mod-poss, ben, instr, purp, mod-loc, manner, mod-prop) • cut down: _ag_th,mod-loc(on) CIL XVII, Prague, July 26, 2003 6
Prague Dependency TreeBank • based on • Functional Generative Description (FGD) (Sgall et al., 1986) • dependency-oriented • stratificational • level of underlying representation (‘tectogrammatical level’) (described in Hajičová et al., 2000) • valency theory (esp. Panevová, 1994) CIL XVII, Prague, July 26, 2003 7
Valency in FGD I. • complementations: • inner participants vs. free modifications • obligatory vs. optional • valency frame: Matka.ACT předělaladětem.ADDR loutku.PAT z Kašpárka.ORIGna čerta.EFF. [Mother re-made a puppet for children from a Punch to an imp.](Panevová) V Praze.LOC se sejdeme na Hlavním nádraží.LOC u pokladen.LOC. (Panevová) [In Prague we will meet at Main Station near a booking-office.] CIL XVII, Prague, July 26, 2003 8
Valency in FGD II. • a ‘middle position: • syntactic criteria are used for the identification of Actor and Patient (Actor is the first inner participant, the second is always a Patient) • other inner participants (Addressee, Origin and Effect) as well as free modifications are determined in accordance with semantic considerations • concept of ‘shifting’(Panevová, 1974-75) Origin Actor Patient Addressee Effect Kniha.ACT vyšla. (Panevová) [The bookappears.] Chlapec.ACT vyrostlv muže.PAT. (Panevová) [A boy grew up to a man.] CIL XVII, Prague, July 26, 2003 9
Valency in FGD III. • valency of autosemantic words • verbs (Panevová, from the seventies) • 5 inner participants - Actor, Patient, Addressee, Origin, Effect • app. 45 free modifications • ‘shifting of cognitive roles’ for inner participants • nouns (esp. Panevová, 2000, Řezníčková, manuscript) • verbal complementations • spec. nominal complementations - Identity, Partitive, Appurtenance, Restrictive and Descriptive Attribute • adjectives (Panevová, 1998) • verbal complementations • spec. adjectival complementations CIL XVII, Prague, July 26, 2003 10
Valency structure on TR level of PDT • the core of annotation on the tectogrammatical level • problem of consistency valency lexicon • verbs two branches: • lists of verbs with their complementations being created and used by annotators (PDT-VALLEX) • complex valency lexicon (VALLEX) • nouns • the theoretical aspects and methodology are refined now (Řezníčková, manuscript) • lists of nouns with their complementations • adjectives • lists of adjectives with their complementations CIL XVII, Prague, July 26, 2003 11
Valency lexicon of verbs – PDT-VALLEX • lists being created and used by annotators • valency frames of verbs in their particular meanings, as they appear during annotation, the lexeme as a whole is not analyzed: • the information specifying elements of frames: ‘functor’ - i.e. name of complementation type - obligatory / optional possible morphemic form(s) • example(s) • it serves for consistency of annotation • approx. 4 700 verbs with 7 150 valency frames (i.e. 1,5 frames per verb) dát [to give] ... ACT(1;obl) ADDR(3;obl) PAT(4;obl) dát někomu knihu [to give sb a book] CIL XVII, Prague, July 26, 2003 12
Valency lexicon of verbs – VALLEX • complex information on the whole verb lexeme in all its meanings(Lopatková, Žabokrtský, 2002) • the information on particular valency frames, corresponding to its meanings (described with gloss(es) and example(s)) • the information specifying elements of frames: ‘functor’ - i.e. name of complementation type - obligatory / optional possible morphemic form(s) mluvit [to speak] ... ACT(1;obl) ADDR(s+7;obl) PAT(o+6;opt) mluvila s ním o dětech [she spoke with him about their children] • additional syntactic information CIL XVII, Prague, July 26, 2003 13
Valency lexicon of verbs – VALLEX II. • additional syntactic information for particular valency frames: • reflexivity (in progress) • reciprocity • control • aspect and aspectual counterparts • possible diatheses, passivization (future plans) • primary / secondary / idiomatic usage • syntactic/semantic class (in progress) • pointers to Czech EuroWordNet (in progress) • frequency of a particular frame in samples of ČNK (60 occurrences of each verb lexeme) CIL XVII, Prague, July 26, 2003 14
Valency lexicon of verbs – VALLEX III. • current state: 1 400 verbs with 3 860frames (i.e. 2,7 frames per verb) • verbs chosen according to their frequency in Czech National Corpus and PDT • about 85% on ‘running text’ in PDT • open questions • enriched valency frame • syntactic-semantic classes • alternative frames • frozen collocations CIL XVII, Prague, July 26, 2003 15
Valency lexicon of verbs • Why two branches? • PDT-VALLEX ~ ‘extensive’ • necessary for annotation • ‘recall’improves relatively quickly • VALLEX ~ ‘intensive’ • the whole lexeme is analyzed en bloc adequate and consistent description • ‘precision’improves • the two branches are supposed to be merged • PDT-VALLEX ~ valuable source for VALLEX CIL XVII, Prague, July 26, 2003 16
Enriched valency frames I. • inner participants • each inner participant can occur only once (with single occurrence of a verb) • combination of inner participants must be listed for a particular verb • morphemic form is predicted by the governing verb • concept of ‘shifting’ is applied • free modifications • each free modification can be repeated • syntactically, they can modify any verb (only semantic restrictions are often present) • they have typical semantics • they do not undergo the ‘shifting’ CIL XVII, Prague, July 26, 2003 18
Enriched valency frames II. • quasi-valency complementations (also Panevová, 2003) • each quasi-valency complementation can occur only once (with any occurrence of a verb) • each quasi-valency complementation is characteristic for a limited list of verbs • morphemic form is predicted by the governing verb • they have typical semantics • they do not undergo the shifting • Obstacle uhodit hlavou o větev.OBST [to bump one's head against a bough] zavadit o stůl.OBST [to brush against a table] • Difference prodloužit o hodinu.DIFF[to prolong by one hour] • Mediator vzít někoho za ruku.MDT [to take sb by his/her hand] CIL XVII, Prague, July 26, 2003 19
Enriched valency frames III. • typical modifications • optional free modifications‘commonly’ used with a verb • usually modify group of verbs with similar meaning • morphemic form • prototypical for some modifications e.g. Dative case or prep. group pro [for]+Acc for Benefactor • determined by the typical semantics of the modifying members e.g. prep. groups na [on]+Loc and v [in]+Loc typically specify Location • ‘verbs of motion’ – typically modified by Direction modification(provided that Direction is not obligatory) jít do kina / přes les / jít z domova [to go to cinema / through the wood / from home] • ‘verbs of exchange’ – typically modified by modification of Recompense dát / dostat / získat / kupovat / brát něco.PAT za něco.RCMP [to give / get / obtain / buy / take something for something] CIL XVII, Prague, July 26, 2003 20
Exploitation of the valency lexicon • reaching the consistency of assigning the valency structure (PDT-VALLEX) • automatic syntactic analysis (‘shallow parsing’) • ‘tectogrammatical parser’ • automatic system for creating an underlying representation of Czech sentences • source data for building the valency lexicon of nouns CIL XVII, Prague, July 26, 2003 21
Resources • theoretical articles on valency (Panevová) • The Manual for Tectogrammatical Tagging of the Prague Dependency Treebank (Hajičová et al., 2000) • lists of particular valency frames created by annotators • electronic valency dictionary of surface realizations of verbal modifiers (FI MU Brno, Pala, Ševeček, 1997) • printed dictionaries Slovesa pro praxi (SPP, 1997), valency specification of 767 most frequent verbs Slovník spisovného jazyka českého (SSJČ, 1964) Slovník spisovné češtiny pro školu a veřejnost (SSČ, 1978) Slovník českých synonym (SČS, 1994) Slovník české frazeologie a idiomatiky (SČFI, 1983) • Czech National Corpus (ČNK) • EuroWordNet, Czech WordNet CIL XVII, Prague, July 26, 2003 22
References I. • Dorr, B.J. (2001)LCS Verb Database, Online Software Database of Conceptual Structures and Documentations, UCMP . • Fillmore, Ch. (2002) FrameNet and the Linking between Semantic and Syntactic Relations. In: COLING 2002, Proceedings, pp. xxviii-xxxvi. • Hajičová, E. et al. (2000) A Manual for Tectogrammatical Tagging of the Prague Dependency Treebank. UFAL/CKL Technical Report TR-2000-09. • Hajičová, E., Kučerová, I.(2002) Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank. In: LREC 2002, Proceedings, pp. 846-851. • Levin, B. (1993) English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago. • Lopatková, M. et al. (2002) Tektogramaticky anotovaný valenční slovník českých sloves.UFAL/CKL Technical Report TR-2002-15. • Lopatková, M., Žabokrtský, Z. (2002) Valency Dictionary of Czech Verbs. In: LREC 2002, Proceedings, pp. 949-956. • Lopatková, M. (2003) Valency in the Prague Dependency Treebank: Building the Valency Lexicon. PBML 79. (in press) CIL XVII, Prague, July 26, 2003 23
References II. • Pala, K., Ševeček, P. (1997) Valence českých sloves. In: Sborník prací FFUB, Brno. • Palmer, M. et al. (2001) Automatic Predicate Argument Analysis of the Penn TreeBank. In: HLT 2001, Proceedings, San Francisco: Morgan Kaufamm. • Panevová, J. (1974-75) On Verbal Frames in Functional Generative Description. Part I, PBML22, pp. 3-40, Part II, PBML 23, pp. 17-52. • Panevová, J.(1994) Valency Frames and the Meaning of the Sentence. In: Luelsdorff (ed.) The Prague School of Structural and Functional Linguistics, John Benjamins, pp. 223-243. • Panevová, J.(1998) Ještě k teorii valence. Slovo a slovesnost 59, pp. 1-14. • Panevová, J.(2000) Poznámky k valenci podstatných jmen. Čeština - univerzália a specifika 2, Masarykova Univerzita, Brno, pp. 173-180. • Panevová, J. (2003) Some Issues of Syntax and Semantics of Verbal Modifications. In: Proceedings of MTT 2003, Paris. (in press) • Sgall, P. et al. (1986)The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Dordrecht: Reidel, Prague: Academia. CIL XVII, Prague, July 26, 2003 24