1 / 26

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project. Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk. The Cambridge Learner Corpus, English Profile, the Sketch Engine, “ freely available ” , HOO, DANTE and the Kelly Project.

helena
Download Presentation

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk

  2. The Cambridge Learner Corpus, English Profile, the Sketch Engine,“freely available”, HOO, DANTE and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk

  3. Cambridge Learner Corpus (CLC) • Since 1993 • Nearly as old as CECL • Leading resource (like ICLE) • CUP and Cambridge ESOL • For better dictionaries, ELT courses, tests • Material: all from exams (levels A1-C2) • 45m words; 22m error-tagged • 200,000 scripts, 138 L1s, 203 nationalities

  4. English Profile • From 2006 • Cambridge Univ, Univ Press, ESOL (+ others) • Goal • for each CEFR level, find characteristic lexis and grammar • Main resource: CLC • Talk on Thursday • Theodora Alexopolou, Helen Yannakoudakis

  5. Flyers

  6. Sketch Engine • Leading corpus tool • Word sketches • One-page summaries of a word’s grammatical and collocational behaviour • In use at OUP, CUP, Collins, Macmillan, INL … • 42 languages • Over 150 corpora • Since May including CHILDES: demo • Since last year including CLC

  7. Error-coded corpus • Challenge • Intuitive to search for x • anywhere • only where it is part of an error • only where it is part of a correction where x can be a word, phrase, grammar pattern … Requirement for CLC in Sketch Engine

  8. Sample text • We will only use those informations to take part of our guest survey

  9. Error-coded corpora in SkE • demo

  10. freely available

  11. freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it??

  12. freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it?? Available To download onto your com To use

  13. Case studies

  14. Non-geeks • Access is important, not download • Web is beautiful

  15. HOO / HOO+ • Helping Our Own • HOO: English-NNS NLP researchers • Developer = user: motivation • Shared task/competitive evaluation • Organisers define task and prepare ‘gold standard’ • Teams participate by running their software over test data • Six teams (incl Tübingen), workshop end Sept

  16. HOO+ (2012) • Probably • English: learner data from CLC • Other languages? • Tasks • Essay scoring • Determiner, preposition errors • ? • http://www.clt.mq.edu.au/research/projects/hoo/

  17. DANTE Highlights of English lexicography

  18. DANTE

  19. DANTE

  20. DANTE

  21. DANTE http://webdante.com Flyers

  22. The KELLY Project • EU Lifelong Learning Project • Word cards • 9 languages • Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish • All 36 pairs • Words the learner should know (at A1 … C2) • Partners • Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ, ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd

  23. Interesting question • How close to purely corpus-based can a pedagogic list be?

  24. Method • Take a general corpus • Count • Review, add, delete using other lists and corpora • Translate (72 directed-lg-pairs) • Words not in source list which occur in translations: • Review source list • http://kelly.sketchengine.co.uk

  25. Symmatrical pairs: <x,y> and <y,x> • Cliques: • For x, y, z, … all pairs are symmetrical • 9-language cliques (English members) • hospital library music sun theory

  26. Homage

More Related