200 likes | 346 Views
Syntagmatic R elations in C orpus and L earner L exicography. Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee.
E N D
SyntagmaticRelationsinCorpus and LearnerLexicography JelenaKallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee
syntagmatic relations ‒ relations that linguistic units (e.g. words, clauses) have with other units because they may occur together in a sequence (Richards, Schmidt 2002: 534) syntagmaticinformationin a dictionary‒ behaviour of the lemma in combination with other words, both grammatically and lexically (Svensén2009: 30) syntagmatic dictionaries: construction or valency dictionaries, collocationdictionaries and idiom dictionaries (Sversen 2009: 30) lexisversusgrammar ↓ towards a lexico-grammar
Towards a lexico-grammar (1) Pattern (Hunston, Francis 1999), construction(Atkins, Rundell 2008), collocation(Barch2004, Siepmann2005) Pattern‒ all the words and structures which are regularly associated with the wordand which contribute to its meaning. A pattern can be identified if a combination of words occurs relatively frequently, if it is dependent on a particular word choice, and there is a clear meaning associated with it(Hunston, Francis1999:32) Collocation‒ holistic lexical, lexico-grammatical or semantic unit normally composed of two or more words which exhibits minimal recurrence within a particular discourse community (Siepmann 2005:438)
Towards a lexico-grammar(2) Syntagmatic relations of Estonian substantives, adjectives, adverbs and verbs are • identified on the basis of Estonian language traditional grammar description; • described as lexico-grammatical patters defined be means of categorical (mostly part of speech) and functional-relational (subject, object, adverbial) labels e.g. fornoun AJP +N (ilus naine), AVP+N (raagus puud), N+PP (hirm vanemate ees) etc. PURPOSE → automaticextraction
Tasksofcorpuslexicography(Rundell, Kilgarriff 2011) • analysis of the corpus: • to discover word senses and other lexical units (fixed phrases, phrasal verbs, compounds, etc.) • to identify the salient features of each of these lexical units (1) their syntactic behaviour (2) the collocations they participate in (3) their colligational preferences (4) any preferences they have for particular text-type or domains • exemplifying relevant features with material gleaned from the corpus
Corpustoolsforthistask(Kilgarriff, Kozem 2013) • Computer-basedtools(WordSmithTools, MonoConcPro, IMC CorpusWorkbench, Antconc) vs. onlinetools(SketchEngine, Korpus DK) • Corpusrelatedtools(XAIRA, Korpus DK) vs. corpus-independenttools • Preparedcorpus vs. Webascorpus (WebCorp) • Simpletools(concordancer, collocation, keywords) vs. advancedtools(wordsketches, CQL searches, GDEX) ResoursesforEstonian: Keeleveeb, Kollokatsioonide leidja, SketchEngine (EstonianReferenceCorpus, ca 250mln, tagged for sentences, clauses, and morphology (POS-tag and inflections) by FILOSOFT Ltd.)
Word SketchesforEstonianinSkE (1) 60 rules • relations, which correspond to POS-tag and morphological inflections (subject, object, adverbials, modifiers) • oblique objects of noun prepositional phrases • oblique objects and adverbials of particle verbs • oblique objects of prepositional verbs • constructions with conjunctionsja/või ‘and/or’, kui/nagu ‘as’ • predicative (complements of the copula-like verb olema‘be’) • various combinations of finite verbs with non-finite verbs • multi-wordverbs
Word SketchesforEstonianinSkE (2) Substantiivifraasi laiendid (NP co-constituents) a) adjektiiv(ifraas), nt ilus naine, tugev mees; b) substantiiv(fraas), nt venna raamat, nokaga müts; c) kaassõnafraas, nt uhkus kodumaa üle; d) infinitiiv(ifraas) soov õppida; e) kvantorifraas, nt sada kilomeetrit, meeter riiet; f) adverb(ifraas), nt raagus puud; g) kõrvallause, nt Muidugi jääb küsimus, kas see isik on sotsiaalselt kindlustatud.
Word SketchesforEstonianinSkE (3) Adverbifraasi laiendid (ADVP co-constituents) a) adverb, nt väga hästi; b) substantiivi käändevorm, nt uksest siinpool, teistest paremini; c) kaassõnafraas, nt selja pealt katki; c) kvantorifraas, nt paar päeva hiljem, mitu kilomeetrit kaugemal; d) kõrvallause, nt Ta rääkis kauem, kui mina seda . Obliikvakäändessubstantiiv võib esineda: seestütlevas (otsast katki, äärest lahti; ootusest elevil, ärevusest hingetu), kaasaütlevas (partneriga vaheldumisi, rahadega kimpus), rajavas (milleni täis, pingul, surmani solvunult, maani täis).
Multi-wordlexicalverbsinSkE(väljendverbid, ühendverbid, ahelverbid, tugiverbiühendid)
Syntagmaticrelationsinlearnerlexicography Keywords EXPLICIT, SELF-EXPLANATORY, THEORY-INDEPENDENT, COMPREHENSIVE Howto present? − in coded metalanguage (N+Adj), in uncodedmetalanguage (not before noun), live examples, in the definition format, as outside matter Howtochoose?
Basic Criteria (Tono 2012) • esinemissagedus (frequency) • esilduvus (logdice) • CERF sõnaloend • (Certification standard forEuropeanReferanceFramework, Cambridge University Project „EnglishVocabulary Profile“ ) • esinemine kooliõpikutes
Whatisoutput? Tono 2012 kollokatsioonisõnaraamatu kasutajaliides
SelectionCriteriain Basic EstonianDictionary(ametlike keeleoskustasemete nõuded, esinemus keeleoskustasemete sõnavaraloendites, koosesinemissagedus) • compiled for Estonian language learners at the beginner (A2) and lower-intermediate (B1) levels • 4500 words (core vocabulary, frequency dictionary and vocabulary profiles of A2 level (Ilves 2008) were used) ═ definition vocabulary ═ the same vocabulary is used for presenting syntagmatic relationships (collocations and government patterns) • government and collocation patterns (in SkE) • statistics: raw frequency or salience → RAW FREQUENCY
References (1) • ATKINS, B. T. S., RUNDELL, M. 2008. The Oxford GuidetoPracticalLexicography. Oxford: OxfordUniversity Press. • BARTSCH, S. 2004. Structural and functionalpropertiesofcollocationsinEnglish. A corpusstudyoflexical and pragmaticconstraints on lexicalco-occurrence. Tübingen, VerlagGunter Narr. • HUNSTON, S., FRANCIS, G. 1999. PatternGrammar: A corpus-drivenapproachtothelexicalgrammarofEnglish. Amsterdam/Philadelphia: John BenjaminsPublishingCompany. • ILVES, M. 2008. Algaka keelekasutaja. A2-taseme eesti keele oskus. Tallinn: Eesti Keele Sihtasutus. • KILGARRIFF jt 2004 = Kilgarriff, Adam, Pavel Rychly, Pavel Smrz ja David Tugwell 2004. TheSketchEngine. – ProceedingsEuralex, Lorient, France. • KILGARRIFF, A., KOZEM, I. 2012. CorpusToolsforLexicographers. – ElectronicLexicography. Oxford: OxfordUniv Press (ilmumas) • Richards, J. C., Schmidt, R. 2002. LongmanDictionaryofLanguageTeaching and AppliedLinguistics. UK: PearsonEducationLimited. • RUNDELL, M., KILGARRIFF, A. 2011. Automatingthecreationofdictionaries: wherewillit all end? – A TasteforCorpora. InhonourofSylvianeGranger. Meunier F., DeCock S., Gilquin G. and Paquot M. (eds). UniversitécatholiquedeLouvain. • SIEPMANN, D. 2005. Collocation, Colligation and EncodingDictionaries. Part I: LexicologicalAspects. – International JournalofLexicography 18, 409-443.
References (2) • SVENSÉN, B. 2009. A HandbookofLexicography. TheTheory and PracticeofDictionary-Making. Cambridge: CambridgeUniversity Press. • TONO, Y. 2011. BilinguallexicographyinJapan. Videoettekanne konverentsil ElectronicLexicographyinthe 21st CenturyNewApplicationsforNewUsers. Bled, 10-12 November . Internetis aadressil http://videolectures.net/elex2011_bled/.