1 / 24

Issue s of Valency in Prague Dependency Treebank: C reating Valency Lexicon of Verbs

Issue s of Valency in Prague Dependency Treebank: C reating Valency Lexicon of Verbs. Markéta Lopatková Center for Computational Linguistics MFF UK, Prague. CIL XVII, Prague, July 26, 2003 1. Motivation. ‘traditional’ linguistics source of data for linguistic research

media
Download Presentation

Issue s of Valency in Prague Dependency Treebank: C reating Valency Lexicon of Verbs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague CIL XVII, Prague, July 26, 2003 1

  2. Motivation • ‘traditional’ linguistics • source of data for linguistic research • verification of theoretical criteria set up • natural language processing • lemmatization • morphological tagging • syntactic analysis • word sense disambiguation • ‘semantic analysis’ • machine translation • building other resources • language acquisition CIL XVII, Prague, July 26, 2003 2

  3. Syntactic vs.semantic approach I. • ‘Levin Verb Classes’ (Levin, 1993) • hypothesis: syntactic features of verbs are semantically determined • method: syntactic behavior semantic classes • ‘alternation’ ~ a change in the realization of the argument structure of a verb • ‘conative alternation’ Edith cuts the bread  Edith cuts at the bread • classes = verbs which undergo certain types of alternations CIL XVII, Prague, July 26, 2003 3

  4. Syntactic vs.semantic approach II. • PropBank (Palmer et al., 2001) • ‘layer of semantic annotation’ in PennTreebank • argument structure for verbs arguments: Arg0, ... Arg5 modificators: ArgM (LOC, TMP, EXT, PRP, ADV) • He was drawing diagrams and sketches for his patron. Arg0: he Rel: drawing Arg1: diagrams and sketches Arg2-for: his patron • He keeps st in the fridge. Arg0: he Rel: keeps Arg1: st Arg2-in: the fridge (also Hajičová, Kučerová, 2002) CIL XVII, Prague, July 26, 2003 4

  5. Syntactic vs.semantic approach III. • FrameNet (Fillmore, 2002) • it groups lexical items with parallel semantic characterization • the structure and particular components correspond to‘semantic roles’ of the common semantic frame • verbs, nouns, adjectives, prepositions • ‘Communication’: ‘Speaker’ ‘Message’ ‘Addressee’ ‘Topic’ ‘Medium’ Tom communicates with Kim about the festival. Tom communicates with Kim by letter. Tom communicates the message to me. • ‘Reciprocality’: ‘Protagonists’ ‘Prot-1’ ‘Prot-2’ Tom fought with Kim. Tom and Kim fought. CIL XVII, Prague, July 26, 2003 5

  6. Syntactic vs.semantic approach IV. • LCS Database (Lexical Conceptual Structure) (Dorr, 2001) • semantic representation • semantic structure + semantic content • verb cut down lexical item: (act_on loc (* thing 1) (* thing 2) ((* [on] 23) loc (*head*) (thing 24)) (cut+ingly 26) (down+/m)) • sentence United States cut down (the) quota. (act_on loc (us+) (quota+) ((* [on] 23) loc (*head*) (thing 24)) (cut+ingly 26) (down+/m)) • logic arguments (ag, exp, th, src, goal, info, perc, loc,poss, time, prop) • logic modifiers (mod-poss, ben, instr, purp, mod-loc, manner, mod-prop) • cut down: _ag_th,mod-loc(on) CIL XVII, Prague, July 26, 2003 6

  7. Prague Dependency TreeBank • based on • Functional Generative Description (FGD) (Sgall et al., 1986) • dependency-oriented • stratificational • level of underlying representation (‘tectogrammatical level’) (described in Hajičová et al., 2000) • valency theory (esp. Panevová, 1994) CIL XVII, Prague, July 26, 2003 7

  8. Valency in FGD I. • complementations: • inner participants vs. free modifications • obligatory vs. optional • valency frame: Matka.ACT předělaladětem.ADDR loutku.PAT z Kašpárka.ORIGna čerta.EFF. [Mother re-made a puppet for children from a Punch to an imp.](Panevová) V Praze.LOC se sejdeme na Hlavním nádraží.LOC u pokladen.LOC. (Panevová) [In Prague we will meet at Main Station near a booking-office.] CIL XVII, Prague, July 26, 2003 8

  9. Valency in FGD II. • a ‘middle position: • syntactic criteria are used for the identification of Actor and Patient (Actor is the first inner participant, the second is always a Patient) • other inner participants (Addressee, Origin and Effect) as well as free modifications are determined in accordance with semantic considerations • concept of ‘shifting’(Panevová, 1974-75) Origin Actor Patient Addressee Effect Kniha.ACT vyšla. (Panevová) [The bookappears.] Chlapec.ACT vyrostlv muže.PAT. (Panevová) [A boy grew up to a man.] CIL XVII, Prague, July 26, 2003 9

  10. Valency in FGD III. • valency of autosemantic words • verbs (Panevová, from the seventies) • 5 inner participants - Actor, Patient, Addressee, Origin, Effect • app. 45 free modifications • ‘shifting of cognitive roles’ for inner participants • nouns (esp. Panevová, 2000, Řezníčková, manuscript) • verbal complementations • spec. nominal complementations - Identity, Partitive, Appurtenance, Restrictive and Descriptive Attribute • adjectives (Panevová, 1998) • verbal complementations • spec. adjectival complementations CIL XVII, Prague, July 26, 2003 10

  11. Valency structure on TR level of PDT • the core of annotation on the tectogrammatical level • problem of consistency  valency lexicon • verbs two branches: • lists of verbs with their complementations being created and used by annotators (PDT-VALLEX) • complex valency lexicon (VALLEX) • nouns • the theoretical aspects and methodology are refined now (Řezníčková, manuscript) • lists of nouns with their complementations • adjectives • lists of adjectives with their complementations CIL XVII, Prague, July 26, 2003 11

  12. Valency lexicon of verbs – PDT-VALLEX • lists being created and used by annotators • valency frames of verbs in their particular meanings, as they appear during annotation, the lexeme as a whole is not analyzed: • the information specifying elements of frames: ‘functor’ - i.e. name of complementation type - obligatory / optional possible morphemic form(s) • example(s) • it serves for consistency of annotation • approx. 4 700 verbs with 7 150 valency frames (i.e. 1,5 frames per verb) dát [to give] ... ACT(1;obl) ADDR(3;obl) PAT(4;obl) dát někomu knihu [to give sb a book] CIL XVII, Prague, July 26, 2003 12

  13. Valency lexicon of verbs – VALLEX • complex information on the whole verb lexeme in all its meanings(Lopatková, Žabokrtský, 2002) • the information on particular valency frames, corresponding to its meanings (described with gloss(es) and example(s)) • the information specifying elements of frames: ‘functor’ - i.e. name of complementation type - obligatory / optional possible morphemic form(s) mluvit [to speak] ... ACT(1;obl) ADDR(s+7;obl) PAT(o+6;opt) mluvila s ním o dětech [she spoke with him about their children] • additional syntactic information CIL XVII, Prague, July 26, 2003 13

  14. Valency lexicon of verbs – VALLEX II. • additional syntactic information for particular valency frames: • reflexivity (in progress) • reciprocity • control • aspect and aspectual counterparts • possible diatheses, passivization (future plans) • primary / secondary / idiomatic usage • syntactic/semantic class (in progress) • pointers to Czech EuroWordNet (in progress) • frequency of a particular frame in samples of ČNK (60 occurrences of each verb lexeme) CIL XVII, Prague, July 26, 2003 14

  15. Valency lexicon of verbs – VALLEX III. • current state: 1 400 verbs with 3 860frames (i.e. 2,7 frames per verb) • verbs chosen according to their frequency in Czech National Corpus and PDT • about 85% on ‘running text’ in PDT • open questions • enriched valency frame • syntactic-semantic classes • alternative frames • frozen collocations CIL XVII, Prague, July 26, 2003 15

  16. Valency lexicon of verbs • Why two branches? • PDT-VALLEX ~ ‘extensive’ • necessary for annotation • ‘recall’improves relatively quickly • VALLEX ~ ‘intensive’ • the whole lexeme is analyzed en bloc  adequate and consistent description • ‘precision’improves • the two branches are supposed to be merged • PDT-VALLEX ~ valuable source for VALLEX CIL XVII, Prague, July 26, 2003 16

  17. Enriched valency frames I. • inner participants • each inner participant can occur only once (with single occurrence of a verb) • combination of inner participants must be listed for a particular verb • morphemic form is predicted by the governing verb • concept of ‘shifting’ is applied • free modifications • each free modification can be repeated • syntactically, they can modify any verb (only semantic restrictions are often present) • they have typical semantics • they do not undergo the ‘shifting’ CIL XVII, Prague, July 26, 2003 18

  18. Enriched valency frames II. • quasi-valency complementations (also Panevová, 2003) • each quasi-valency complementation can occur only once (with any occurrence of a verb) • each quasi-valency complementation is characteristic for a limited list of verbs • morphemic form is predicted by the governing verb • they have typical semantics • they do not undergo the shifting • Obstacle uhodit hlavou o větev.OBST [to bump one's head against a bough] zavadit o stůl.OBST [to brush against a table] • Difference prodloužit o hodinu.DIFF[to prolong by one hour] • Mediator vzít někoho za ruku.MDT [to take sb by his/her hand] CIL XVII, Prague, July 26, 2003 19

  19. Enriched valency frames III. • typical modifications • optional free modifications‘commonly’ used with a verb • usually modify group of verbs with similar meaning • morphemic form • prototypical for some modifications e.g. Dative case or prep. group pro [for]+Acc for Benefactor • determined by the typical semantics of the modifying members e.g. prep. groups na [on]+Loc and v [in]+Loc typically specify Location • ‘verbs of motion’ – typically modified by Direction modification(provided that Direction is not obligatory) jít do kina / přes les / jít z domova [to go to cinema / through the wood / from home] • ‘verbs of exchange’ – typically modified by modification of Recompense dát / dostat / získat / kupovat / brát něco.PAT za něco.RCMP [to give / get / obtain / buy / take something for something] CIL XVII, Prague, July 26, 2003 20

  20. Exploitation of the valency lexicon • reaching the consistency of assigning the valency structure (PDT-VALLEX) • automatic syntactic analysis (‘shallow parsing’) • ‘tectogrammatical parser’ • automatic system for creating an underlying representation of Czech sentences • source data for building the valency lexicon of nouns CIL XVII, Prague, July 26, 2003 21

  21. Resources • theoretical articles on valency (Panevová) • The Manual for Tectogrammatical Tagging of the Prague Dependency Treebank (Hajičová et al., 2000) • lists of particular valency frames created by annotators • electronic valency dictionary of surface realizations of verbal modifiers (FI MU Brno, Pala, Ševeček, 1997) • printed dictionaries Slovesa pro praxi (SPP, 1997), valency specification of 767 most frequent verbs Slovník spisovného jazyka českého (SSJČ, 1964) Slovník spisovné češtiny pro školu a veřejnost (SSČ, 1978) Slovník českých synonym (SČS, 1994) Slovník české frazeologie a idiomatiky (SČFI, 1983) • Czech National Corpus (ČNK) • EuroWordNet, Czech WordNet CIL XVII, Prague, July 26, 2003 22

  22. References I. • Dorr, B.J. (2001)LCS Verb Database, Online Software Database of Conceptual Structures and Documentations, UCMP . • Fillmore, Ch. (2002) FrameNet and the Linking between Semantic and Syntactic Relations. In: COLING 2002, Proceedings, pp. xxviii-xxxvi. • Hajičová, E. et al. (2000) A Manual for Tectogrammatical Tagging of the Prague Dependency Treebank. UFAL/CKL Technical Report TR-2000-09. • Hajičová, E., Kučerová, I.(2002) Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank. In: LREC 2002, Proceedings, pp. 846-851. • Levin, B. (1993) English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago. • Lopatková, M. et al. (2002) Tektogramaticky anotovaný valenční slovník českých sloves.UFAL/CKL Technical Report TR-2002-15. • Lopatková, M., Žabokrtský, Z. (2002) Valency Dictionary of Czech Verbs. In: LREC 2002, Proceedings, pp. 949-956. • Lopatková, M. (2003) Valency in the Prague Dependency Treebank: Building the Valency Lexicon. PBML 79. (in press) CIL XVII, Prague, July 26, 2003 23

  23. References II. • Pala, K., Ševeček, P. (1997) Valence českých sloves. In: Sborník prací FFUB, Brno. • Palmer, M. et al. (2001) Automatic Predicate Argument Analysis of the Penn TreeBank. In: HLT 2001, Proceedings, San Francisco: Morgan Kaufamm. • Panevová, J. (1974-75) On Verbal Frames in Functional Generative Description. Part I, PBML22, pp. 3-40, Part II, PBML 23, pp. 17-52. • Panevová, J.(1994) Valency Frames and the Meaning of the Sentence. In: Luelsdorff (ed.) The Prague School of Structural and Functional Linguistics, John Benjamins, pp. 223-243. • Panevová, J.(1998) Ještě k teorii valence. Slovo a slovesnost 59, pp. 1-14. • Panevová, J.(2000) Poznámky k valenci podstatných jmen. Čeština - univerzália a specifika 2, Masarykova Univerzita, Brno, pp. 173-180. • Panevová, J. (2003) Some Issues of Syntax and Semantics of Verbal Modifications. In: Proceedings of MTT 2003, Paris. (in press) • Sgall, P. et al. (1986)The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Dordrecht: Reidel, Prague: Academia. CIL XVII, Prague, July 26, 2003 24

More Related