180 likes | 332 Views
Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July. On translation units and automatic processing. Natural Language Processing -Main lexical problems-. Disambiguation Multiword expressions All levels of language Point of view: Monolingual Multilingual
E N D
Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July On translation units and automatic processing
Natural Language Processing -Main lexical problems- • Disambiguation • Multiword expressions • All levels of language • Point of view: • Monolingual • Multilingual • Interlingual
Interlingual task: translation (I) • Problem: text segmentation • Machine translation: • Need for objective criteria for segmentation
Interlingual task: translation (II) • Multiword segments • Multiword expressions • “Of the same order of magnitude as the number of single words” (Jakendoff 1977) • 41% - WordNet 1.7 (Fellbaum 1999)
Linguistic levels • Lexicology (and terminology) • Degree of lexicalization • Morphology and syntax • Components: order, cooccurrence, inflection... • Semantics • Decomposability, other relationships • Pragmatics • Context, equivalent words • Text analysis
Points of view for analysing • Traditional Linguistics • Since 1957... • Computational Linguistics • “A pain in the neck” (Sag et al. 2002) • Translation – Machine Translation • Need for better approaches
Names and definitions for MWE (I) • “Idiosyncratic interpretations that cross word boundaries (or spaces)” (Sag et al. 2002) • “A sequence of words that acts as a single unit at some level of linguistic analysis” (Calzolari et al. 2002) • “Any phrase that is not entirely predictable on the basis of standard grammar rules and lexical entries” (LinGO Lab, Stanford University)
Names and definitions for MWE (II) • English: • Multiword Expressions (MWE) o Units (MWU) (Cowie, 1985) • Multi-word lexemes (MWL) (Gates, 1988) • Multiword lexical unit (Zgusta, 1967) • complex lexemes and lexical units (Lipka, 1983) • Basque: • lexia konplexuak (Abaitua, 2002) • hitz anitzeko unitate lexikalak (HAUL) (Grupo IXA) • Spanish: • expresiones o unidades multipalabra • multiverbales (Alvar Ezquerra, 2000) • poliléxicas (Benson, 1985) • expresiones pluriverbales (Casares, 1992 [1950]) • unidades pluriverbales lexicalizadas y habitualizadas (Haensch et al., 1982) • unidad léxica pluriverbal (Hernández, 1989) • unidades fraseológicas (UFS) o fraseologismo (Zuluaga, 1980) • lexías complejas (Abaitua, 1997)
Classification criteria and linguistic description • Cooccurrence and/or need of some components • Syntactic and semantic transparency • Formal and semantic compositionality • Frozen or fixed status • Selectional restrictions • Violation of some general syntactic patterns or rules • Degree of lexicalization • Degree of conventionality • Idiomaticity
Taxonomy (I) • Lexicalized phrases • Fixed expressions • Semi-fixed expressions • Non-Decomposable idioms • Compound Nominals • Proper Names • Multiword terminology • Syntactically flexible-expressions • Verb-particle constructions • Decomposable idioms • Light verbs • Institutionalized phrases (collocations) Sag et al., 2002
Taxonomy (II) • Fixed expressions: • Adverbial phrases: • Al pie de la letra – to the letter – hitzez hitz • De improviso – suddenly – ziplo • Prepositional phrases: • A causa de – because of - (r)en ondorioz* • En torno a – around – inguruan • Multiword conjunctions: • Mientras tanto – meanwhile – bitartean • Con tal de que – so long as – ba...* • Latin expressions: • Ad hoc, sine dubio, sine die...
Taxonomy (III) • Semi-fixed expressions • Non-Decomposable idioms: • kick the bucket / estirar la pata • Compound Nominals • Viaje de novios – honeymoon – eztei-bidaia • Proper Names • the (Oakland) Raiders (problemática propia) • Multiword terminology • Mayoría absoluta – absolute majority – erabateko gehiengo
Taxonomy (IV) • Syntactically flexible-expressions • Verb-particle constructions • Non-compositionals: write up, look up / acordarse de, constar de / posposizioak • compositionals: break up • Decomposable idioms • spill the beans – revelar un secreto • Light verbs: • make, do, have, give • hacer, tener, ser, dar • egin, izan, eman
Taxonomy (V) • Institutionalized phrases (collocations) • Pay attention – poner/prestar atención – arreta eman • Heavy smoker – fumador empedernido – erretzaile amorratua • Red wine – vino tinto – ardo beltza (Examples from Testuteka http://paginaspersonales.deusto.es/abaitua/deli/ testuteka/index.html)
MultiWord Expression as Translation Unit • Translation Units: difficulty in definition and classification • Vázquez-Ayora (1977): • “simple” • “diluted” – “multiple-to-one-equivalents” (Nida) • “fractionary” "In fact there are good reasons for keeping the UT (in the sense of translation atom) in MT as small -and hence as manageable- as possible" (Bennet, 1994)
Methods for processing • Simbolics • Words-with-spaces • Hierarchical Lexicon with Default Constraint Inheritance • Circumscribed Constructions • Lexical Selection • Information about Frequency • Example: Villavicencio et al. 2004 • Statistics • F. Smadja: Xtract
Conclusions • MWEs as Translation Units • Approach from Translation and, specially, from Machine Translation • Linguistic definition and precision for better processing
That’s all folks! ¡Eso es todo amigos! Agur Ben-Hur!