1 / 24

X. Blanco, A. Catena, S. Fuentes Autonomous University of Barcelona

Macro- or Microstructure? Improving the lexical coverage of an electronic dictionary while enriching microstructural information. X. Blanco, A. Catena, S. Fuentes Autonomous University of Barcelona. Xavier.Blanco@uab.es. Outline. Electronic Dictionaries of Spanish. Macro- or microstructure ?.

Download Presentation

X. Blanco, A. Catena, S. Fuentes Autonomous University of Barcelona

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Macro- or Microstructure?Improving the lexical coverage of an electronic dictionarywhile enriching microstructural information X. Blanco, A. Catena, S. Fuentes Autonomous University of Barcelona Xavier.Blanco@uab.es 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  2. Outline Electronic Dictionaries of Spanish Macro- or microstructure? Case studies: prefixed forms suffixed forms cliticized forms 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  3. HAMT MT Indexation Electronic Dictionaries Spelling Semantics IR Morphology Syntax Search Engines Introduction 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  4. Macrostructure Electronic Dictionary of simple forms: 80 000 entries – 1 000 000 inflectional forms Electronic Dictionary of compound forms: 250 000 compound nouns 4 500 compound adverbs Other Databases: 75 000 Proper Nouns 260 000 FFF etc. 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  5. G: N5;4 M: - T: Abst C: <textos> D: cinéma P: R1P1 R: estándar V: - N0: Hum N1: de <films> N2: según <textos: lit> Caus1Func0: escribir Labor13: adaptar S0Labor13: adaptación A2Labor13: adaptado de, basado en Bon: interesante, apasionate, inteligente AntiBon: absurdo, incoherente, aburrido Real: leer S: script, argumento, intriga, trama Fr: scénario En: script De: Drehbuch (...) Microstructure Samples guión (script) 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  6. Macro- or Microstructure? Unknown words: dame dámelo léelo léeselo ... donne-le-lui Unknown words: inabatible inacostumbrado imborrable reacostumbrar reaceptar complet(o)ísimo querid(o)ísimo riquísimo pastelito puebl(o)ecito 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  7. Macro- or Microstructure? Lemma Microstructural information Derived lemma 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  8. 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  9. 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  10. Unknown simple words &Analyzed tokens (PFX & SFX) 323,072 unknown unigrams in Spanish Webpages 68,818 candidates to new simple-word lexical entries PFX.grf (99) 6.380 analyzed tokens SFX.grf (54) 271 analyzed tokens 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  11. Three different classes of Analyzed tokens • First case: • The constructed form is lexicalized • Then, we need to add a new, independent entry e.g. prelavado (prewashing), macrofiesta... 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  12. Three different classes of Analyzed tokens • Second case: • The constructed form is lexically conditioned • Then, it must be explicitly indicated in the “Lexical Functions” area of the microstructure of the lexical basis e.g. superamigos, ??archiamigos, ??hiperamigos, *maxiamigos “somos superamigos” = “somos muy amigos”, “somos amigos íntimos” (close friends) Lexical Function = Magn (close friends, heavy smoker, confirmed bachelor...) 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  13. Three different classes of Analyzed tokens • Third case: The constructed forms is actually a constructed form ! In other words, a lexical unit of the dictionary plus a prefix that expresses a value of the actualisation of this lexical unit: tense, aspect, diathesis, negation or quantification. The bad news: In order to generate reasonable hypothesis, we need to represent some linguistic constraints. 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  14. Some linguistic contraints anteparto, {ante,ante.PFX+tps+anterioridad},{parto,parto.N1+Abst:ms} antepuerto, {ante,ante.PFX+posición},{puerto,puerto.N1+Loc:ms} Pattern: ex + Nhum<post>: • exrector,{ex,ex.PFX+tps+anterioridad},{rector,rector.N51:ms} • expresidente,{ex,ex.PFX+tps+anterioridad}/{presidente,presidente.N50:ms} 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  15. Prefixes: Tense ante,PFX+temporal+anterioridad ex,PFX+temporal+anterioridad neo,PFX+temporal+posterioridad pre,PFX+temporal+anterioridad pos,PFX+temporal+posterioridad post,PFX+temporal+posterioridad 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  16. Prefixes: Aspect & Aktionsart re,PFX+modo de accción+iteración sobre,PFX+modo de accción+nimifactivo sub,PFX+modo de accción+refactivo Prefixes: Diathesis auto,PFX+diatesis+reflexividad co,PFX+ diatesis +comitativo entre,PFX+ diatesis +reciprocidad 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  17. Prefixes: Negation anti,PFX+negación+oposición contra,PFX+negación+oposición des,PFX+negación+privación des,PFX+negación+reversión in,PFX+negación sin,PFX+negación+privación 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  18. Prefixes: Quantification multi,PFX+cuantificación octa,PFX+cuantificación octo,PFX+cuantificación penta,PFX+cuantificación pluri,PFX+cuantificación poli,PFX+cuantificación sex,PFX+cuantificación tetra,PFX+cuantificación tri,PFX+cuantificación uni,PFX+cuantificación bi,PFX+cuantificación cuatri,PFX+cuantificación deca,PFX+cuantificación dodeca,PFX+cuantificación endeca,PFX+cuantificación enea,PFX+cuantificación hecto,PFX+cuantificación hepta,PFX+cuantificación hexa,PFX+cuantificación mono,PFX+cuantificación 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  19. Prefixes: Lexical Functions infra,PFX+FL+AnMagn infra,PFX+FL+Loc inter,PFX+FL+Loc intra,PFX+FL+Loc intro,PFX+FL+Loc iso,PFX+FL+Magn macro,PFX+FL+Magn mal,PFX+FL+Magn maxi,PFX+FL+Magn medio,PFX+FL+Magn mega,PFX+FL+Magn meta,PFX+FL+Magn micro,PFX+FL+Magn mini,PFX+FL+Magn para,PFX+FL+Magn peri,PFX+FL+Magn post,PFX+FL+Loc pre,PFX+FL+Loc pro,PFX+FL+Loc pseudo,PFX+FL+AntiVer re,PFX+FL+Loc re,PFX+FL+Magn requete,PFX+FL+Magn retro,PFX+FL+Loc seudo,PFX+FL+AntiVer so,PFX+FL+Magn sobre,PFX+FL+Loc sobre,PFX+FL+Magn soto,PFX+FL+Loc sub,PFX+FL+AntiMagn sub,PFX+FL+Loc super,PFX+FL+Loc super,PFX+FL+Magn supra,PFX+FL+Loc supra,PFX+FL+Loc trans,PFX+FL+Loc tras,PFX+FL+Loc ultra,PFX+FL+Loc ultra,PFX+FL+Magn vice,PFX+FL+Loc ante,PFX+FL+Loc anti,PFX+FL+Loc archi,PFX+FL+Magn casi,PFX+FL+AntiVer circun,PFX+FL+Loc cuasi,PFX+FL+AntiVer endo,PFX+FL+Loc entre,PFX+FL+Loc epi,PFX+FL+Loc equi,PFX+FL exo,PFX+FL+Loc extra,PFX+FL+Loc extra,PFX+FL+Magn hetero,PFX+FL hiper,PFX+FL+Magn hipo,PFX+FL+AnMagn homo,PFX+FL 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  20. Appreciative Suffixes ecito,.SFX+diminutivo eja,.SFX+diminutivo ejo,.SFX+diminutivo engue,.SFX+peyorativo eta,.SFX+diminutivo ete,.SFX+diminutivo ica,.SFX+diminutivo ico,.SFX+diminutivo illa,.SFX+diminutivo illo,.SFX+diminutivo ín,.SFX+diminutivo ina,.SFX+diminutivo ingo,.SFX+peyorativo ingue,.SFX+peyorativo ita,.SFX+diminutivo ito,.SFX+diminutivo ón,.SFX+aumentativo ona,.SFX+aumentativo orio,.SFX+peyorativo orra,.SFX+peyorativo orrio,.SFX+peyorativo orro,.SFX+peyorativo ota,.SFX+aumentativo ote,.SFX+aumentativo uca,.SFX+peyorativo ucha,.SFX+peyorativo ucho,.SFX+peyorativo uco,.SFX+peyorativo uda,.SFX+aumentativo udo,.SFX+aumentativo uela,.SFX+diminutivo uelo,.SFX+diminutivo uja,.SFX+peyorativo ujo,.SFX+peyorativo ute,.SFX+peyorativo uza,.SFX+peyorativo acha,.SFX+peyorativo acho,.SFX+peyorativo aco,.SFX+peyorativo aja,.SFX+peyorativo ajo,.SFX+peyorativo al,.SFX+aumentativo ales,.SFX+peyorativo alla,.SFX+peyorativo anga,.SFX+peyorativo ángana,.SFX+peyorativo ángano,.SFX+peyorativo ango,.SFX+peyorativo astra,.SFX+peyorativo astre,.SFX+peyorativo astro,.SFX+peyorativo aza,.SFX+aumentativo azo,.SFX+aumentativo ecita,.SFX+diminutivo 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  21. Clitic Pronouns cantarme cantándome cántame ... • cantar,W • cantando,G • canta,Y:1s • cante,Y:2s • cantemos,Y:1p • cantad,Y:2p • canten,Y:3p 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  22. Clitic Pronouns oslas,.CLIT+N2(2p)_N1(fp) oslo,.CLIT+N2(2p)_N1(ms) oslos,.CLIT+N2(2p)_N1(mp) se,.CLIT+Pron/N1/N2-0 sela,.CLIT+N2(3)_N1(fs) selas,.CLIT+N2(3)_N1(fp) sele,.CLIT+Pron_N1/N2(3s) seles,.CLIT+Pron_N1/N2(3p) selo,.CLIT+N2(3)_N1(ms) selos,.CLIT+N2(3)_N1(mp) seme,.CLIT+Pron_N1/N2(1s) senos,.CLIT+Pron_N1/N2(1p) seos,.CLIT+Pron_N1/N2(2p) sete,.CLIT+Pron_N1/N2(2s) te,.CLIT+N1/N2(2s) tela,.CLIT+N2(2s)_N1(fs) telas,.CLIT+N2(2s)_N1(fp) telo,.CLIT+N2(2s)_N1(ms) telos,.CLIT+N2(2s)_N1(mp) la,.CLIT+N1(3fs) las,.CLIT+N1(3fp) le,.CLIT+N1/N2(3s) les,.CLIT+N1/N2(3p) lo,.CLIT+N1(3ms) los,.CLIT+N1(3mp) me,.CLIT+N1/N2(1s) mela,.CLIT+N2(1s)_N1(fs) melas,.CLIT+N2(1s)_N1(fp) melo,.CLIT+N2(1s)_N1(ms) melos,.CLIT+N2(1s)_N1(mp) nos,.CLIT+N1/N2(1p) nosla,.CLIT+N2(1p)_N1(fs) noslas,.CLIT+N2(1p)_N1(fp) noslo,.CLIT+N2(1p)_N1(ms) noslos,.CLIT+N2(1p)_N1(mp) os,.CLIT+N1/N2(2p) osla,.CLIT+N2(2p)_N1(fs) 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  23. Verbal Structures discurrir#1/N0:Tps/Fr:s’écouler/En:to pass discurrir#2/N0:Hum/N1:Prép loc/Fr:parcourir/En:to wander discurrir#3/N0:Inc<líquidos>/N1:Prép Loc/Fr:couler/En:to flow discurrir#4/N0:Loc<vías fluv.>/N1:Prép Loc/Fr:couler/En:to flow discurrir#5/N0:Hum/N1:Abst/Fr:inventer/En:to think up discurrir#6/N0:Hum/N1:sobre Abst/Fr:réfléchir/En:to think discurrir#7/N0:Hum/N1:sobre Abst/N2:con Hum/Es:to discourse e.g.: Así que tendréis que discurrirlo vosotros => discurrir#5 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

  24. Conclusions and Perspectives • An unigram can convey, in a cumulative form, information about actualization, lexical functions and argument structure. • The mechanism is, in someway, recursive: auto-des-programar, auto-des-program-able... mega-rollitos de primavera... We need an integrated description of morphology, syntax and semantics ! 6th INTEX Workshop Sofia, Bulgaria 28-30 May 2003

More Related