240 likes | 666 Views
Macro- or Microstructure? Improving the lexical coverage of an electronic dictionary while enriching microstructural information. X. Blanco, A. Catena, S. Fuentes Autonomous University of Barcelona. Xavier.Blanco@uab.es. Outline. Electronic Dictionaries of Spanish. Macro- or microstructure ?.
Macro- or Microstructure?Improving the lexical coverage of an electronic dictionarywhile enriching microstructural information X. Blanco, A. Catena, S. Fuentes Autonomous University of Barcelona Xavier.Blanco@uab.es 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Outline Electronic Dictionaries of Spanish Macro- or microstructure? Case studies: prefixed forms suffixed forms cliticized forms 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
HAMT MT Indexation Electronic Dictionaries Spelling Semantics IR Morphology Syntax Search Engines Introduction 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Macrostructure Electronic Dictionary of simple forms: 80 000 entries – 1 000 000 inflectional forms Electronic Dictionary of compound forms: 250 000 compound nouns 4 500 compound adverbs Other Databases: 75 000 Proper Nouns 260 000 FFF etc. 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
G: N5;4 M: - T: Abst C: <textos> D: cinéma P: R1P1 R: estándar V: - N0: Hum N1: de <films> N2: según <textos: lit> Caus1Func0: escribir Labor13: adaptar S0Labor13: adaptación A2Labor13: adaptado de, basado en Bon: interesante, apasionate, inteligente AntiBon: absurdo, incoherente, aburrido Real: leer S: script, argumento, intriga, trama Fr: scénario En: script De: Drehbuch (...) Microstructure Samples guión (script) 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Macro- or Microstructure? Unknown words: dame dámelo léelo léeselo ... donne-le-lui Unknown words: inabatible inacostumbrado imborrable reacostumbrar reaceptar complet(o)ísimo querid(o)ísimo riquísimo pastelito puebl(o)ecito 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Macro- or Microstructure? Lemma Microstructural information Derived lemma 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Unknown simple words &Analyzed tokens (PFX & SFX) 323,072 unknown unigrams in Spanish Webpages 68,818 candidates to new simple-word lexical entries PFX.grf (99) 6.380 analyzed tokens SFX.grf (54) 271 analyzed tokens 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Three different classes of Analyzed tokens • First case: • The constructed form is lexicalized • Then, we need to add a new, independent entry e.g. prelavado (prewashing), macrofiesta... 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Three different classes of Analyzed tokens • Second case: • The constructed form is lexically conditioned • Then, it must be explicitly indicated in the “Lexical Functions” area of the microstructure of the lexical basis e.g. superamigos, ??archiamigos, ??hiperamigos, *maxiamigos “somos superamigos” = “somos muy amigos”, “somos amigos íntimos” (close friends) Lexical Function = Magn (close friends, heavy smoker, confirmed bachelor...) 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Three different classes of Analyzed tokens • Third case: The constructed forms is actually a constructed form ! In other words, a lexical unit of the dictionary plus a prefix that expresses a value of the actualisation of this lexical unit: tense, aspect, diathesis, negation or quantification. The bad news: In order to generate reasonable hypothesis, we need to represent some linguistic constraints. 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Some linguistic contraints anteparto, {ante,ante.PFX+tps+anterioridad},{parto,parto.N1+Abst:ms} antepuerto, {ante,ante.PFX+posición},{puerto,puerto.N1+Loc:ms} Pattern: ex + Nhum<post>: • exrector,{ex,ex.PFX+tps+anterioridad},{rector,rector.N51:ms} • expresidente,{ex,ex.PFX+tps+anterioridad}/{presidente,presidente.N50:ms} 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Prefixes: Tense ante,PFX+temporal+anterioridad ex,PFX+temporal+anterioridad neo,PFX+temporal+posterioridad pre,PFX+temporal+anterioridad pos,PFX+temporal+posterioridad post,PFX+temporal+posterioridad 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Prefixes: Aspect & Aktionsart re,PFX+modo de accción+iteración sobre,PFX+modo de accción+nimifactivo sub,PFX+modo de accción+refactivo Prefixes: Diathesis auto,PFX+diatesis+reflexividad co,PFX+ diatesis +comitativo entre,PFX+ diatesis +reciprocidad 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Prefixes: Negation anti,PFX+negación+oposición contra,PFX+negación+oposición des,PFX+negación+privación des,PFX+negación+reversión in,PFX+negación sin,PFX+negación+privación 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Prefixes: Quantification multi,PFX+cuantificación octa,PFX+cuantificación octo,PFX+cuantificación penta,PFX+cuantificación pluri,PFX+cuantificación poli,PFX+cuantificación sex,PFX+cuantificación tetra,PFX+cuantificación tri,PFX+cuantificación uni,PFX+cuantificación bi,PFX+cuantificación cuatri,PFX+cuantificación deca,PFX+cuantificación dodeca,PFX+cuantificación endeca,PFX+cuantificación enea,PFX+cuantificación hecto,PFX+cuantificación hepta,PFX+cuantificación hexa,PFX+cuantificación mono,PFX+cuantificación 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Prefixes: Lexical Functions infra,PFX+FL+AnMagn infra,PFX+FL+Loc inter,PFX+FL+Loc intra,PFX+FL+Loc intro,PFX+FL+Loc iso,PFX+FL+Magn macro,PFX+FL+Magn mal,PFX+FL+Magn maxi,PFX+FL+Magn medio,PFX+FL+Magn mega,PFX+FL+Magn meta,PFX+FL+Magn micro,PFX+FL+Magn mini,PFX+FL+Magn para,PFX+FL+Magn peri,PFX+FL+Magn post,PFX+FL+Loc pre,PFX+FL+Loc pro,PFX+FL+Loc pseudo,PFX+FL+AntiVer re,PFX+FL+Loc re,PFX+FL+Magn requete,PFX+FL+Magn retro,PFX+FL+Loc seudo,PFX+FL+AntiVer so,PFX+FL+Magn sobre,PFX+FL+Loc sobre,PFX+FL+Magn soto,PFX+FL+Loc sub,PFX+FL+AntiMagn sub,PFX+FL+Loc super,PFX+FL+Loc super,PFX+FL+Magn supra,PFX+FL+Loc supra,PFX+FL+Loc trans,PFX+FL+Loc tras,PFX+FL+Loc ultra,PFX+FL+Loc ultra,PFX+FL+Magn vice,PFX+FL+Loc ante,PFX+FL+Loc anti,PFX+FL+Loc archi,PFX+FL+Magn casi,PFX+FL+AntiVer circun,PFX+FL+Loc cuasi,PFX+FL+AntiVer endo,PFX+FL+Loc entre,PFX+FL+Loc epi,PFX+FL+Loc equi,PFX+FL exo,PFX+FL+Loc extra,PFX+FL+Loc extra,PFX+FL+Magn hetero,PFX+FL hiper,PFX+FL+Magn hipo,PFX+FL+AnMagn homo,PFX+FL 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Appreciative Suffixes ecito,.SFX+diminutivo eja,.SFX+diminutivo ejo,.SFX+diminutivo engue,.SFX+peyorativo eta,.SFX+diminutivo ete,.SFX+diminutivo ica,.SFX+diminutivo ico,.SFX+diminutivo illa,.SFX+diminutivo illo,.SFX+diminutivo ín,.SFX+diminutivo ina,.SFX+diminutivo ingo,.SFX+peyorativo ingue,.SFX+peyorativo ita,.SFX+diminutivo ito,.SFX+diminutivo ón,.SFX+aumentativo ona,.SFX+aumentativo orio,.SFX+peyorativo orra,.SFX+peyorativo orrio,.SFX+peyorativo orro,.SFX+peyorativo ota,.SFX+aumentativo ote,.SFX+aumentativo uca,.SFX+peyorativo ucha,.SFX+peyorativo ucho,.SFX+peyorativo uco,.SFX+peyorativo uda,.SFX+aumentativo udo,.SFX+aumentativo uela,.SFX+diminutivo uelo,.SFX+diminutivo uja,.SFX+peyorativo ujo,.SFX+peyorativo ute,.SFX+peyorativo uza,.SFX+peyorativo acha,.SFX+peyorativo acho,.SFX+peyorativo aco,.SFX+peyorativo aja,.SFX+peyorativo ajo,.SFX+peyorativo al,.SFX+aumentativo ales,.SFX+peyorativo alla,.SFX+peyorativo anga,.SFX+peyorativo ángana,.SFX+peyorativo ángano,.SFX+peyorativo ango,.SFX+peyorativo astra,.SFX+peyorativo astre,.SFX+peyorativo astro,.SFX+peyorativo aza,.SFX+aumentativo azo,.SFX+aumentativo ecita,.SFX+diminutivo 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Clitic Pronouns cantarme cantándome cántame ... • cantar,W • cantando,G • canta,Y:1s • cante,Y:2s • cantemos,Y:1p • cantad,Y:2p • canten,Y:3p 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Clitic Pronouns oslas,.CLIT+N2(2p)_N1(fp) oslo,.CLIT+N2(2p)_N1(ms) oslos,.CLIT+N2(2p)_N1(mp) se,.CLIT+Pron/N1/N2-0 sela,.CLIT+N2(3)_N1(fs) selas,.CLIT+N2(3)_N1(fp) sele,.CLIT+Pron_N1/N2(3s) seles,.CLIT+Pron_N1/N2(3p) selo,.CLIT+N2(3)_N1(ms) selos,.CLIT+N2(3)_N1(mp) seme,.CLIT+Pron_N1/N2(1s) senos,.CLIT+Pron_N1/N2(1p) seos,.CLIT+Pron_N1/N2(2p) sete,.CLIT+Pron_N1/N2(2s) te,.CLIT+N1/N2(2s) tela,.CLIT+N2(2s)_N1(fs) telas,.CLIT+N2(2s)_N1(fp) telo,.CLIT+N2(2s)_N1(ms) telos,.CLIT+N2(2s)_N1(mp) la,.CLIT+N1(3fs) las,.CLIT+N1(3fp) le,.CLIT+N1/N2(3s) les,.CLIT+N1/N2(3p) lo,.CLIT+N1(3ms) los,.CLIT+N1(3mp) me,.CLIT+N1/N2(1s) mela,.CLIT+N2(1s)_N1(fs) melas,.CLIT+N2(1s)_N1(fp) melo,.CLIT+N2(1s)_N1(ms) melos,.CLIT+N2(1s)_N1(mp) nos,.CLIT+N1/N2(1p) nosla,.CLIT+N2(1p)_N1(fs) noslas,.CLIT+N2(1p)_N1(fp) noslo,.CLIT+N2(1p)_N1(ms) noslos,.CLIT+N2(1p)_N1(mp) os,.CLIT+N1/N2(2p) osla,.CLIT+N2(2p)_N1(fs) 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Verbal Structures discurrir#1/N0:Tps/Fr:s’écouler/En:to pass discurrir#2/N0:Hum/N1:Prép loc/Fr:parcourir/En:to wander discurrir#3/N0:Inc<líquidos>/N1:Prép Loc/Fr:couler/En:to flow discurrir#4/N0:Loc<vías fluv.>/N1:Prép Loc/Fr:couler/En:to flow discurrir#5/N0:Hum/N1:Abst/Fr:inventer/En:to think up discurrir#6/N0:Hum/N1:sobre Abst/Fr:réfléchir/En:to think discurrir#7/N0:Hum/N1:sobre Abst/N2:con Hum/Es:to discourse e.g.: Así que tendréis que discurrirlo vosotros => discurrir#5 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003
Conclusions and Perspectives • An unigram can convey, in a cumulative form, information about actualization, lexical functions and argument structure. • The mechanism is, in someway, recursive: auto-des-programar, auto-des-program-able... mega-rollitos de primavera... We need an integrated description of morphology, syntax and semantics ! 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003