1 / 20

Syntagmatic R elations in C orpus and L earner L exicography

Syntagmatic R elations in C orpus and L earner L exicography. Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee.

arissa
Download Presentation

Syntagmatic R elations in C orpus and L earner L exicography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SyntagmaticRelationsinCorpus and LearnerLexicography JelenaKallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

  2. syntagmatic relations ‒ relations that linguistic units (e.g. words, clauses) have with other units because they may occur together in a sequence (Richards, Schmidt 2002: 534) syntagmaticinformationin a dictionary‒ behaviour of the lemma in combination with other words, both grammatically and lexically (Svensén2009: 30) syntagmatic dictionaries: construction or valency dictionaries, collocationdictionaries and idiom dictionaries (Sversen 2009: 30) lexisversusgrammar ↓ towards a lexico-grammar

  3. Towards a lexico-grammar (1) Pattern (Hunston, Francis 1999), construction(Atkins, Rundell 2008), collocation(Barch2004, Siepmann2005) Pattern‒ all the words and structures which are regularly associated with the wordand which contribute to its meaning. A pattern can be identified if a combination of words occurs relatively frequently, if it is dependent on a particular word choice, and there is a clear meaning associated with it(Hunston, Francis1999:32) Collocation‒ holistic lexical, lexico-grammatical or semantic unit normally composed of two or more words which exhibits minimal recurrence within a particular discourse community (Siepmann 2005:438)

  4. Towards a lexico-grammar(2) Syntagmatic relations of Estonian substantives, adjectives, adverbs and verbs are • identified on the basis of Estonian language traditional grammar description; • described as lexico-grammatical patters defined be means of categorical (mostly part of speech) and functional-relational (subject, object, adverbial) labels e.g. fornoun AJP +N (ilus naine), AVP+N (raagus puud), N+PP (hirm vanemate ees) etc. PURPOSE → automaticextraction

  5. Tasksofcorpuslexicography(Rundell, Kilgarriff 2011) • analysis of the corpus: • to discover word senses and other lexical units (fixed phrases, phrasal verbs, compounds, etc.) • to identify the salient features of each of these lexical units (1) their syntactic behaviour (2) the collocations they participate in (3) their colligational preferences (4) any preferences they have for particular text-type or domains • exemplifying relevant features with material gleaned from the corpus

  6. Corpustoolsforthistask(Kilgarriff, Kozem 2013) • Computer-basedtools(WordSmithTools, MonoConcPro, IMC CorpusWorkbench, Antconc) vs. onlinetools(SketchEngine, Korpus DK) • Corpusrelatedtools(XAIRA, Korpus DK) vs. corpus-independenttools • Preparedcorpus vs. Webascorpus (WebCorp) • Simpletools(concordancer, collocation, keywords) vs. advancedtools(wordsketches, CQL searches, GDEX) ResoursesforEstonian: Keeleveeb, Kollokatsioonide leidja, SketchEngine (EstonianReferenceCorpus, ca 250mln, tagged for sentences, clauses, and morphology (POS-tag and inflections) by FILOSOFT Ltd.)

  7. Word SketchesforEstonianinSkE (1) 60 rules • relations, which correspond to POS-tag and morphological inflections (subject, object, adverbials, modifiers) • oblique objects of noun prepositional phrases • oblique objects and adverbials of particle verbs • oblique objects of prepositional verbs • constructions with conjunctionsja/või ‘and/or’, kui/nagu ‘as’ • predicative (complements of the copula-like verb olema‘be’) • various combinations of finite verbs with non-finite verbs • multi-wordverbs

  8. Word SketchesforEstonianinSkE (2) Substantiivifraasi laiendid (NP co-constituents) a) adjektiiv(ifraas), nt ilus naine, tugev mees; b) substantiiv(fraas), nt venna raamat, nokaga müts; c) kaassõnafraas, nt uhkus kodumaa üle; d) infinitiiv(ifraas) soov õppida; e) kvantorifraas, nt sada kilomeetrit, meeter riiet; f) adverb(ifraas), nt raagus puud; g) kõrvallause, nt Muidugi jääb küsimus, kas see isik on sotsiaalselt kindlustatud.

  9. Lemma diskussioonsõnavisand

  10. Word SketchesforEstonianinSkE (3) Adverbifraasi laiendid (ADVP co-constituents) a) adverb, nt väga hästi; b) substantiivi käändevorm, nt uksest siinpool, teistest paremini; c) kaassõnafraas, nt selja pealt katki; c) kvantorifraas, nt paar päeva hiljem, mitu kilomeetrit kaugemal; d) kõrvallause, nt Ta rääkis kauem, kui mina seda . Obliikvakäändessubstantiiv võib esineda: seestütlevas (otsast katki, äärest lahti; ootusest elevil, ärevusest hingetu), kaasaütlevas (partneriga vaheldumisi, rahadega kimpus), rajavas (milleni täis, pingul, surmani solvunult, maani täis).

  11. Lemma omaette sõnavisand

  12. Multi-wordlexicalverbsinSkE(väljendverbid, ühendverbid, ahelverbid, tugiverbiühendid)

  13. Syntagmaticrelationsinlearnerlexicography Keywords EXPLICIT, SELF-EXPLANATORY, THEORY-INDEPENDENT, COMPREHENSIVE Howto present? − in coded metalanguage (N+Adj), in uncodedmetalanguage (not before noun), live examples, in the definition format, as outside matter Howtochoose?

  14. Basic Criteria (Tono 2012) • esinemissagedus (frequency) • esilduvus (logdice) • CERF sõnaloend • (Certification standard forEuropeanReferanceFramework, Cambridge University Project „EnglishVocabulary Profile“ ) • esinemine kooliõpikutes

  15. Whatisoutput? Tono 2012 kollokatsioonisõnaraamatu kasutajaliides

  16. SelectionCriteriain Basic EstonianDictionary(ametlike keeleoskustasemete nõuded, esinemus keeleoskustasemete sõnavaraloendites, koosesinemissagedus) • compiled for Estonian language learners at the beginner (A2) and lower-intermediate (B1) levels • 4500 words (core vocabulary, frequency dictionary and vocabulary profiles of A2 level (Ilves 2008) were used) ═ definition vocabulary ═ the same vocabulary is used for presenting syntagmatic relationships (collocations and government patterns) • government and collocation patterns (in SkE) • statistics: raw frequency or salience → RAW FREQUENCY

  17. Noun discussion patterns according to raw frequency

  18. NoundiscussionpatternsaccordingtologDice

  19. References (1) • ATKINS, B. T. S., RUNDELL, M. 2008. The Oxford GuidetoPracticalLexicography. Oxford: OxfordUniversity Press. • BARTSCH, S. 2004. Structural and functionalpropertiesofcollocationsinEnglish. A corpusstudyoflexical and pragmaticconstraints on lexicalco-occurrence. Tübingen, VerlagGunter Narr. • HUNSTON, S., FRANCIS, G. 1999. PatternGrammar: A corpus-drivenapproachtothelexicalgrammarofEnglish. Amsterdam/Philadelphia: John BenjaminsPublishingCompany. • ILVES, M. 2008. Algaka keelekasutaja. A2-taseme eesti keele oskus. Tallinn: Eesti Keele Sihtasutus. • KILGARRIFF jt 2004 = Kilgarriff, Adam, Pavel Rychly, Pavel Smrz ja David Tugwell 2004. TheSketchEngine. – ProceedingsEuralex, Lorient, France. • KILGARRIFF, A., KOZEM, I. 2012. CorpusToolsforLexicographers. – ​ElectronicLexicography. Oxford: OxfordUniv Press (ilmumas) • Richards, J. C., Schmidt, R. 2002. LongmanDictionaryofLanguageTeaching and AppliedLinguistics. UK: PearsonEducationLimited. • RUNDELL, M., KILGARRIFF, A. 2011. Automatingthecreationofdictionaries: wherewillit all end? – A TasteforCorpora. InhonourofSylvianeGranger. Meunier F., DeCock S., Gilquin G. and Paquot M. (eds). UniversitécatholiquedeLouvain. • SIEPMANN, D. 2005. Collocation, Colligation and EncodingDictionaries. Part I: LexicologicalAspects. – International JournalofLexicography 18, 409-443.

  20. References (2) • SVENSÉN, B. 2009. A HandbookofLexicography. TheTheory and PracticeofDictionary-Making. Cambridge: CambridgeUniversity Press. • TONO, Y. 2011. BilinguallexicographyinJapan. Videoettekanne konverentsil ElectronicLexicographyinthe 21st CenturyNewApplicationsforNewUsers. Bled, 10-12 November . Internetis aadressil http://videolectures.net/elex2011_bled/.

More Related