1.06k likes | 1.7k Views
Specialised translation and terminology. Koen Kerremans Centrum voor Vaktaal en Communicatie Erasmushogeschool Brussel http://cvc.ehb.be. Part 1: “Terminography for translators: methodology”. Purpose. To show some steps terminographers go through in order to develop specialised dictionaries
E N D
Specialised translation and terminology Koen Kerremans Centrum voor Vaktaal en Communicatie Erasmushogeschool Brussel http://cvc.ehb.be
Purpose • To show some steps terminographers go through in order to develop specialised dictionaries • To raise awareness concerning the specific problems that may arise during the compilation of such dictionaries • To present a method in terminology description, Termontography, which supports the development of ontologically-underpinned terminological dictionaries
Terminology specialised dictionary • Within the present scope: • Terminology / Terminography • is the study and the field of activity concerned with the collection, the description and the presentation of terms (Sager 1990:2). Terms are related to subject-field communication (e.g. technical writing, technical documentation). • “the practical task of producing dictionaries of lexical items that are specific to specialised domains of knowledge” (Meyer 2001:279). • Specialised dictionary • results from the process of creating, storing, processing, recording, reusing, etc. specialised information and knowledge
Preliminary remarks (1/3) • Within the present scope: • User of the specialised dictionary? • Translator • Requirements of this specific user • Content of the dictionary? • Format of the dictionary?
= text-oriented approach Preliminary remarks (2/3) • data gathering for lexical analysis may be based on: • introspection • elicitation of data • observation of non-elicited language use
Preliminary remarks (3/3) • Lots of texts are currently available in electronic formats • It becomes possible to ‘process’ these texts using specific software tools • ‘Terminotics’
Specialised dictionary Terminology specialised dictionary • Corpus compilation • Term identification • Information extraction • Analysis and synthesis • Encoding • Organisation • Management
1. Corpus compilation • = searching and categorising texts considered relevant for terminological analysis • Problem: representiveness • Scientific specialised discourse • Scientific official discourse • Scientific pedagogical or didactic discourse • Scientific semi-popularised discourse • Scientific popularised discourse At least 2 languages! (e.g. Laurian 1983; Meyer and Mackintosh 1996; Pearson 1998)
1. Corpus compilation • Tools (examples): • Search engine: • “an information retrieval system designed to help find information stored on a computer system, such as on the World Wide Web” (http://en.wikipedia.org/wiki/Search_engines). • Web crawler • “a program or automated script which browses the World Wide Web in a methodical, automated manner” (http://en.wikipedia.org/wiki/Web_crawler). • Text aligner • a tool that organises “different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions”(http://portal.bibliotekivest.no/terminology.htm).
2. Term identification • = extracting terms from texts that have been gathered during the corpus compilation phase • What is a term? • “A semantically charged linear structure, which names an abstract or concrete reality studied [in]a special-subject field” (Collet 2004:109). • A lexical unit that has a special meaning depending on the thematic context.
2. Term identification TOPIC: early retirement When the eligibility criteria for early retirement were tightened, early retirees began being granted the status of older unemployed. Standard unemployment benefits are higher for unemployed persons over the age of 50 who have been unemployed for a year but have spent 20 years in work. Until very recently, those in the “older unemployed” category were exempt from the ‘actively seeking work’ rule, which suggested that it was virtually impossible to find work again after the age of 50. Since summer 2002, however, this exemption for the older unemployed is gradually being phased out. It is also the case that early retirement arrangements have become opaque and inequitable. The range of measures is now so wide that there has clearly been some duplication. They include early retirement on a half-time basis and career break measures, now replaced by the time-credit scheme.
Knowledge of the language Knowledge of the world (the domain) Knowledge of the (dictionary) user profile 2. Term identification • Automatic term extraction ≠ automatic keyword extraction!
2. Term identification • Our solution in application-oriented terminology projects: • Set up a categorisation framework • Map terminology to the framework
2. Term identification • A categorisation framework: • = an ontologically-underpinned framework of (meta)categories and (meta)relations which is used to extract and organise multilingual terminology • Advantages: • Helps us to establish extraction criteria as to what terms in text are or should be (cf. ‘15th day of the month following that in which the chargeable event took place’) • Facilitates the process of aligning multilingual terminology
2. Term identification Dutch (Belgium): transactions for which no VAT is required vrijstelling hyperonym of hyponym of niet onderworpen aan BTW … transactions not allowing the French (Belgium): supplier to deduct VAT exemption … transactions allowing the supplier English (UK): to deduct VAT exemption zero - rated transactions occurring outside the outside the scope of VAT territory of the VAT legislation at … stake English (Ireland): exemption transactions occurring outside the scope of VAT zero - rated …
2. Term identification • Idea of mapping terminology to a categorisation framework is adopted in the Termontography approach
2. Term identification Search phase (3) (mono- or multilingual) domain-specific corpus first version of termontological database ?? Dictionary Refinement phase (4) (mono- or multilingual) termontological database Information gathering phase (2) TSR + categorisation framework Verification phase (5) Domain- experts Validation phase (6) Knowledge Analysis phase (1)
2. Term identification • Termontography is a terminological approach in which one structures terminological information, retrieved from a corpus of texts, according to a framework of domain-specific knowledge.
3. Information extraction • = adding ‘supplementary information’ to each term • Dictionaries should be designed for special users groups in response to specific needs (cf. ‘Knowledge analysis phase’ in Termontography) • What supplementary information do translators require? • Synonyms? Translation equivalents? Part of speech tags? Examples? Contexts? Collocations? Domain specifications? Definitions? ( what type of definition?)
3. Information extraction • Techniques to find out user requirements are amongst others: • Surveys • Experimental research & Model Building
3. Information extraction • Surveys: • To ask people what they use dictionaries for and how • Not very reliable
3. Information extraction • Experimental research: • Look-up behaviour of subjects • Error analysis
3. Information extraction Model building(based on translation process) (Agirre et al. 2001)
3. Information extraction • Translators need insight in at least three different types of contexts: • linguistic context of a translation unit, • cultural (situational) context • cognitive (ontological) context • A translator having access to terminological knowledge resources providing him with information on these different types of contexts, is likely to produce high quality translations
3. Information extraction • On the whole translating dictionaries and traditional multilingual terminological resources do not provide sufficient information for the translator • Multilingual terminology management must widen its scope towards knowledge management and representation (Meyer 1992, Dancette 1997, Temmerman 2000, 2003, 2005): • providing a cognitive structure in order to improve the understanding of the specialised domain • providing extralinguistic / encyclopaedic information in order to improve the understanding of terms and categories in the specialised domain (of source and target language)
3. Information extraction Dancette, J. & C. Réthoré (2000). Dictionnaire Analytique de la Distribution. Analytical Dictionary of Retailing. Les presses de l’université de Montréal Users: translators who are to translate from English into French on ‘retailing’
3. Information extraction • Aims: • to maximally stimulate the creativity of the translator by offering ontologically enriched information on the subject, in the French language (target language for the translator) • to optimise understanding by stimulating the semantic network in the brain of the translator
3. Information extraction • The dictionary user gets introduced to the meaning of the term in several textual modules formulated in French: • définition • précisions sémantiques • relations internotionelles • compléments d’information • informations linguistiques • contextes • exemples • Cross-referencing is provided for by printing entries that are covered in another article for French in bold and for English in small capitals. Related terms for French are in bold, for English in italics.
3. Information extraction Example: ‘label’ Définition: Document d’identification du produit qui lui est apposé ou y est attaché et qui en décrit les caractéristiques (nature, prix, provenance, marque, etc.).
3. Information extraction Example: ‘label’ Précision sémantiques: Depuis les années 1970, l’étiquette comprend généralement un code-barre(BAR CODE). Le code-barre contient des informations telles que la description et le prix du produit, qui seront lues à l’ aide d’un lecteur optique (OPTICAL READER).
3. Information extraction Example: ‘label’ Relations internotionelles: Le terme anglais TAG désigne une étiquette que l’ on peut facilement enlever, ce qui n’ est pas le cas de label. Ne pas confondre l’anglais LABEL avec son homonyme label, qui a le sens de marque (BRAND), comme dans le terme PRIVATE LABEL (marque de distributeur).
3. Information extraction Example: ‘label’ Compléments d’information: Les producteurs ont l’ obligation, en vertu de la Loi sur la protection du consommateur (Consumer Protection Act), de répertorier sur l’étiquette tous les ingrédients contenues dans le produit alimentaire.
3. Information extraction Example: ‘label’ Information linguistique: Étiqueter: to ticket étiqueteuse: labeler, label machine
3. Information extraction Example: ‘label’ Contextes: But it wasn’t until 1900 that [he] put the first Polar labelon a bottle of cool, naturally purified water taken directly from one of these springs on his property. http://www.water.com/polar/index.html (30-3-99) Dans ce but, la réglementation mise au point par les organismes de la CEE et par l’ administration française prévoit sur chaque étiquette la présence d’un certain nombre de mentions obligatoires, en fonction de la catégorie du vin. http://www.vin.champagne.com/etiq.htm(30-3-99)
3. Information extraction • Challenge: how to arrive at specialised dictionaries offering ontologically-enriched information? • analysis of Knowledge Rich Contexts (Meyer 2001)
3. Information extraction • ‘Knowledge Rich Contexts’ (Meyer 2001:281): • “a context indicating at least one item of domain knowledge that could be useful for conceptual analysis. In other words, the context should indicate at least one conceptual characteristic, whether it be an attribute or relation.” • can be used to derive synonyms and translation equivalents
3. Information extraction KWIC concordancer
-> hyperonymy -> attribute 3. Information extraction • Certain contextual markers may indicate in KRCs specific conceptual relations. • Compost: a ready-to-use soil enricher that looks and feels like dark, crumbly soil. • Compost contains nutritients, nitrogen, potassium and phosphorus. • Compost is perhaps best defined as organic material assembled for fast decomposition. • Compost, a dark, nutritient-rich soil conditioner, consists of a small amount of soil along with decomposed or partially decomposed plant residues. ->meronymy -> purpose
3. Information extraction • Synonyms and translation equivalents are identified based on a comparison between KRCs: • cooccurrence or substitution tests • feature analysis
Superstore Assortment: Food: very wide assortment Non-food: very wide assortment (house-hold products, clothing, kitchen utensils, gardening tools, etc.) Area: 2300 to 4600 m2 Hypermarché Assortment: Food: very wide assortment Non-food: very wide assortment (house-hold products, clothing, kitchen utensils, gardening tools + electronical appliances, furniture, etc.) Area: Up to 24.000 m2 3. Information extraction FEATURE ANALYSIS • Supermarché • Assortment: • Food: very wide assortment • Non-food: fairly wide assortment (house-hold products, clothing, etc.) • Area: • 400 to 2500 m2
3. Information extraction Category: “an event on which VAT has to be paid” Domain: VAT law English-UK: chargeable event VAT will be due on the date the invoice is issued English-Ireland: chargeable event VAT is due no later than the 15th day of the month following the month in which the supply takes place French: fait générateur VAT is due at the moment the goods are supplied
Summary • Steps terminographers have to go through in order to develop specialised dictionaries for translators: • Requirements of translators ( knowledge about the linguistic, situational and cognitive contexts) • Problems discussed: • Representiveness of the corpus • Term identification ( categorisation frameworks?) • Terminology structuring (variation)
Other problems? • Analysis and synthesis (definitions) • Encoding (précision sémantiques vs. relations internotionelles vs. complément d’information) • Organisation (tree structure, hyperlinks, ‘traditional’ term records) • Management (dictionary up-to-date?)
Part 2:“Towards ‘intelligent’ dictionaries for translators”
Purpose • Which information sources do translators use during the translation of a given text sample? • How do we arrive at ‘intelligent’ dictionaries? • Possibilities? • Technology?
Translation sample • Egypt was in the best position to develop a great civilization. Each year the "Gift of the Nile" would be a flood brought on by the monsoon. These floods brought only a thin layer of silt, dropped on the banks, from both a jungle area and also a mountainous area. The White Nile brought highly mineralized silt which would be eroded from Abyssinian Alps 1500 miles inland in Central Africa. The silt from the Blue Nile was heavy with humus from the jungle and swampy sources. Not only did the flood bring silt, the soil would be soft and easy to plow. They would plant and harvest in early spring and then allow the fields to lay until July when the floods would come again. Based on: http://historylink101.com/lessons/farm-city/egypt1.htm Resources?
Some resources Translation dictionaries Resources Explanatory dictionaries … Specialised dictionaries Encyclopedia Synonym dictionaries Picture dictionaries Translation forums Translation engines Combinatory dictionaries
Some resources Translation dictionaries Resources Explanatory dictionaries … Specialised dictionaries Encyclopedia Synonym dictionaries Picture dictionaries Translation forums Translation engines Combinatory dictionaries