160 likes | 359 Views
A renewed Portuguese module for INTEX 4.3x. Cristina Mota LabEL (CAUTL/IST) and Linguateca Av. Rovisco Pais I 1049-001 Lisboa, Portugal cristina@label.ist.utl.pt. 6 th Intex Workshop Sofia, Bulgaria - May 28-30. Overview. Two major issues will be discussed: .
A renewed Portuguese module for INTEX 4.3x Cristina Mota LabEL (CAUTL/IST) and Linguateca Av. Rovisco Pais I 1049-001 Lisboa, Portugal cristina@label.ist.utl.pt 6th Intex WorkshopSofia, Bulgaria - May 28-30
Overview Two major issues will be discussed: • The analysis of diminutive, augmentative and superlative forms, in particular of those having accented base forms. For instance: • pá / pazinha (shovel / small shovel); • rápido / rapidíssimo (fast / very fast) • The aim of this presentation is to show how the new features of Intex 4.3x helped in the representation and treatment of Portuguese specific problems. • The analysis of modified verbal and clitic forms. Example: • Nós comprámos um livro (We bought a book); Nós comprámo-lo (We bought it); * Nós comprámos-o
Analysis of diminutive, augmentative and superlative forms Dim. coelho [rabbit] coelhinho, coelhinha, coelhinho, coelhinha [little rabbit] inho (or ito) diminutive suffix is added to the base form, inflecting in gender and number leão [lion] leãozinho, leoazinha, leõezinhos, leoazinhas [little lion] zinho (or zito) diminutive suffix is added to the inflected base form, inflecting in gender and number Aug. carro [car] carrão, carrões [big car] Sup. denso [dense] densíssimo, densíssima, densíssimos, densíssimas [very dense] Nouns and adjectives in Portuguese vary in gender and number. Besides receiving the gender and number morphemes they also accept diminutive, augmentative and superlative (only the adjectives) suffixes.
Analysis of diminutive, augmentative and superlative forms • Representation by Inflectional Graphs • Prior to the new morphological parser, the only way of recognizing nouns and adjectives accepting grade variation was by introducing a code in the DELAS entries that allowed the generation of the corresponding diminutive, augmentative and superlative forms in the DELAF dictionary. coelho [rabbit] coelhinho, coelhinha, coelhinho, coelhinha coelho,N001_dh001_dt001 … coelhinho,coelho.N:ms coelhinha,coelho.N:fs coelhinhos,coelho.N:mp coelhinhas,coelho.N:fp …
Analysis of diminutive, augmentative and superlative forms • Whenever a noun or an adjective with acute or circumflex accent have grade variation, the corresponding forms do not have the accent. For instance: pá [shovel] pazinha recaída [relapse] recaidazinha côdea [crust] codeazinha dúvida [doubt] duvidazinha célula [cell] celulazinha lágrima [tear] lagrimazinha • The Problematic cases • In Portuguese, words may have one of four accents: acute (á, é, í, ó, ú), grave (à), circumflex (â, ê, ô) and tilde (ã, õ). There are a few words with two accents: an acute accent and a tilde. • Even though all these diminutive words are formed by adding the suffix –zinha, the base forms should have different inflectional codes, using this first approach, increasing the number of inflectional graphs. • In order to keep the same code, these forms are generated with accents and then a AWK script removes them obtaining the final DELAF.
Analysis of diminutive, augmentative and superlative forms Analysis of diminutive forms -zinha of accented words: celulazinha, pazinha, etc. Analysis of diminutive forms –zinha of non-accented words: aldeiazinha, aventurazinha, … • Representation by Derivational Graphs • The new morphological parser of INTEX 4.3x makes possible the representation of the accent deletion process.
Analysis of diminutive, augmentative and superlative forms Results of the Derivational Graphs After applying the derivational graph in conjunction with the DELAF dictionary, the morphological parser recognizes both the diminutives created from non-accented words: • aldeiazinha,{aldeia,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} • aventurazinha,{aventura,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} as well as the diminutives created from accented words: • celulazinha,{célula,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} • codeazinha,{côdea,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} • duvidazinha,{dúvida,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} • lagrimazinha,{lágrima,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} • pazinha,{pá,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} • recaidazinha,{recaída,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs} Misleading Results Laginha,{Laga,laga.N:fs}{inha,.SUF+Dim:fs} Laginha is a proper name trocinhos,{trocos,troco.N:mp}{inhos,.SUF+Dim:mp} troquinhos is the diminutive of trocos not trocinhos
Analysis of diminutive, augmentative and superlative forms Solution A Remove diminutives, augmentatives and superlatives from the DELAF. Since they can be homographs of other words, the derivational graphs will be very restrictive and used with normal priority. Solution B Keep diminutives, augmentatives and superlatives in the DELAF. The derivational graphs will be more flexible and conceived in a way they can help easily enlarging the DELAS. They will be used with low priority.
Verb-Clitc Analysis • When the clitic pronouns o (3ms), a (3fs), os (3mp), as (3fp) are after the verbal form, bound to it by an hyphen, they may have undergone formal modifications, depending on the verbal form termination. Thus, if the termination is: • a vowel or an oral diphthong, the clitic forms do not undergo any modifications: o, a, os, as; • a nasal diphthong, the clitic forms change to: no, na, nos, nas; • -r, -s or -z, the clitic forms change to:lo, la, los, las. In this context, the verbal forms are also modified, loosing the final consonant. The vowel preceding the -r forms, will receive an accent (acute or circumflex depending on the thematic vowel of the verb). No modification Ele comprou um livro ontem [He bought a book yesterday] Ele comprou-o ontem [He bought-it yesterday] Clitic modification Eles compraram um livro ontem [They bought a book yesterday] Eles compraram-no ontem [They bougth-it yesterday] Verbal form and Clitic modification Nós comprámos um livro ontem [We bought a book yesterday ] Nós comprámo(s)-lo ontem [We bought-it yesterday ]
Verb-Clitc Analysis Simple Present of the Verb comprar(to buy) compro,comprar.V:P1s compras,comprar.V:P2s compra,comprar.V:P2s:P2's:P3s:Y2s compramos,comprar.V:P1p compramo,comprar.V:P1p comprais,comprar.V:P2p comprai,comprar.V:P2p:Y2p compram,comprar.V:P2'p:P3p The entries containing inflectional information in bold correspond to verbal forms that are modified by the presence of clitics. In the presence of reflexive and dative pronouns nos (1p) and vos (2p), the first and second plural verbal forms ending in -s are modified. The clitics do not suffer modifications. Verbal form modification Nós vestimo(s)-nos [We dressed ourselves] The modified verbal and clitic forms are described in the inflectional graphs and consequently are generated simultaneously with the non-modified forms when the DELAF is created. However, the Intex 4.2x DELAF version did not have information about the clitics that allowed to (i) distinguish the two forms and (ii) guaranty the correct combination of the verbal form with the clitic.
Verb-Clitc Analysis In the new module, it was integrated information about clitics to the verbal and to the acusative, dative and reflexive clitic forms. The possible clitic codes are: i the verbal form may occur without clitics c the clitic is not modified and does not modify the verbal form o clitic forms o, a, os , as l clitic forms lo, la, los , las n clitic forms no, na, nos , nas q clitic may modify verbal form (nos and vos) This information is enclosed between square brackets in the inflectional features field: compro,comprar.V:P1s[icqo] compras,comprar.V:P2s[icq] compra,comprar.V:P2s[l]:P4s[icqo]:P3s[icqo]:Y2s[icqo] compramos,comprar.V:P1p[ic] compramo,comprar.V:P1p[ql] Form occurs with clitic c orwithout clitic (i) os,eu.PRO:4mp[o]:3mp[o] los,eu.PRO:4mp[l]:3mp[l] nos,eu.PRO:1p[q]:4mp[n]:3mp[n] te,eu.PRO:2s[c] Form occurs only with clitics q and l
Verb-Clitc Analysis Disambiguation grammar for removing incorrect verb-clitic combinations. The introduction of the clitic codes allows to disambiguate verb-clitic combinations.
Verb-Clitc Analysis Analysis of the future with clitic included in a negative context; substitution by declarative context Analysis of the future with clitic included in a declarative context; substitution by negative context Eles não o comprarão Eles comprá-lo-ão Eles comprá-lo-ão Eles não o comprarão The clitic codes can also be used in syntactic transformations to obtain the correct forms of the verbs and clitics.
The Portuguese 4.3x modulehttp://label.ist.utl.pt/public-resources.html DELAS / DELAF Enhanced with clitic information • Inflectional Graphs • Nouns, Adjectives • Verbs • Pronouns • Determiners, Conjunctions, Prepositions, Adverbs • DELAC / DELACF • Nouns • Adverbs • Prepositions • Conjunctions • Derivational Graphs • Superlative • Augmentative • Superlative • Other productive processes • Lexical graphs • Roman numerals • Cardinal numerals • Ordinal numerals Acronym dictionaries(and corresponding description dictionary) • Local Grammars • Auxiliary Verb Tagging • Temporal Expressions • Numeric Expressions • Disambiguation Grammars • NP containing Adjectives • Verb-Clitic sequences
Productive Derivational Creation The first steps towards a description of productive derivational processes are also being given. The main goal is to analyze unknown words and help in the enhancement of the DELAF.
Productive Derivational Creation • Remarks • Even though the graph seems very productive, it should be stressed that it is not meant to be an alternative to not including, for instance, nouns resulting from nominalizations, in the DELAS. • If it was the case, the graph should be more restrictive: • <$verbo#ir.V+Nominalization:W> {ção,.SUF:fs} • and the verb entries should account for the possibility of the nominalization: • construir,V+Nominalization • Anyway it is important to relate the two entries (the verb and the noun) by adding the corresponding information to the entries: • construir,V_N=2ção • construção,N_V=3ir • The introduction of this type of information will be one of our major concerns.