1 / 41

Workshop: AntConc as a corpus-linguistic program for ad-hoc LSP purposes Presentation for the

Workshop: AntConc as a corpus-linguistic program for ad-hoc LSP purposes Presentation for the NDSU Conference by Birthe Mousten MA, Ph.D. 19 July , 2016.

davidmccoy
Download Presentation

Workshop: AntConc as a corpus-linguistic program for ad-hoc LSP purposes Presentation for the

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop: AntConc as a corpus-linguistic program for ad-hoc LSP purposes Presentation for the NDSU Conference by Birthe Mousten MA, Ph.D. 19 July, 2016

  2. Corpus linguistics is a field connected with tracking language for mega-corpora such as Oxford’s, Longman’s and Webster’s dictionaries. Even though such dictionaries are based on huge corpora, they often fail to provide answers to even fairly simple technical and scientific terminology and lexicography questions. This knowledge representation gap cannot be referred to corpus linguistics being useless for LSP purposes, but must be referred to the lack of LSP inputs in the corpora used for dictionary tracking. This is where an ad-hoc corpus comes into the picture. An ad-hoc corpus is collected on the fly, typically for a certain LSP task at a certain time. It can therefore be used as a tool for technical writers and translators who need to swiftly map the lexicographic, terminological and genre characteristics of a new, delimited field. The ad-hoc corpus tool is the freeware program AntConc. Join me for an ad-hoc LSP task. Abstract

  3. Articlesabout ad-hoc corpus linguistics Mega-corpora Ad-hoc-corpora Traditional corpus use Cross-linguistic corpus use Specializedlanguagecorpora Training with AntConc – freeware program Corpus linguistics - Overview

  4. Corpus linguistics - articles Ourhumble start:

  5. Corpus linguistics - Tutorial

  6. Corpus linguistics – Case study

  7. American National Corpora: http://corpus.byu.edu/ British National Corpus: http://www.natcorp.ox.ac.uk/ Wordschatz: Http://wortschatz.uni-leipzig.de/ KorpusDK http://ordnet.dk/ Megacorpora

  8. Overview American National Corpora

  9. British National Corpus

  10. German National Corpus

  11. Danish National Corpus

  12. Youwillbeasked to work with insulin analogslater, so why not test thatone right now? Wewillsearch the American megacorpora for insulin analog and seewhere it getsus. Let ustry the American megacorpus first. Task later: Knowledge about insulin analogs

  13. American Now corpus Result: 8 hits = 8 texts; at a closer look maybeonly 4, of whichone is not American English, but probably Indian English, and one is international English. The three hits from FDAareprobably from onlyonetext. So in practice, from an American point of view, there is the FDA text and the Seeking Alpha text. Not veryimpressing: Shows the need for a lack of furthersearch to work with the area. However, why not take the FDA text for our corpus nowthatwe have it. So weclick the text and…

  14. The first Reference renders this: ….getthistext, whichwecopy for ourtext corpus.

  15. Our corpus text NO. 1 The same textcopied to Word.

  16. Copytext Open Word Savingtext in Word in a file in a folder: - Save as –> source/title/date (or yourchosen parameters) - Save as .txt by choosingplaintext => (all codes from html removed) Startingourcollection for the corpus

  17. Save as My chosen folder Source text date Plain text

  18. I have to searchelsewhere for text, and why not the largest big-data corpus in the world: Google. I use Google advancedsearch – it is easier, and quicker. But a small step beforethat – I readmy wiki https://www.google.ca/advanced_search Building up the corpus

  19. To helpmemake a veryprecisesearch on Google, I sneak peak in Wikipedia to seewhether it knowsanythingabout insulin analog: An insulin analog is an altered form of insulin, different from any occurring in nature, but still available to the human body for performing the same action as human insulin in terms of glycemic control. Through genetic engineering of the underlying DNA, the amino acid sequence of insulin can be changed to alter its ADME (absorption, distribution, metabolism, and excretion) characteristics. Officially, the U.S. Food and Drug Administration (FDA) refers to these as "insulin receptor ligands", although they are more commonly referred to as insulin analogs. These modifications have been used to create two types of insulin analogs: those that are more readily absorbed from the injection site and therefore act faster than natural insulin injected subcutaneously, intended to supply the bolus level of insulin needed at mealtime (prandial insulin); and those that are released slowly over a period of between 8 and 24 hours, intended to supply the basal level of insulin during the day and particularly at nighttime (basal insulin). The first insulin analog approved for human therapy (insulin Lispro rDNA) was manufactured by Eli Lilly and Company. Wikipedia

  20. Then Google Advanced I wantthese parameters My keyconcept Most recent year

  21. Google Advanced results Ok – let’sget to it then Give yourself 10 minutes to copypasteintoyour folder.

  22. Ten minutesafter I have my corpus Then I am ready for my corpus work

  23. Now --- our task We just got a task from a company, say Eli Lilly or Novo Nordisk aboutwriting or translation somethingabout insulin analog How can a corpus helpyou? Register Collocations Definitions Synonyms Knowledge! Etc.

  24. Getting the program Find Laurence Anthony’s website – Just google search the name. By the way, the address is here: http://www.antlab.sci.waseda.ac.jp/software.html#antpconc Press: AntConc 3.4.4 (or the Mac or other version that is compatible with your computer) (NB: The languagecode must be Western Latin 1! (Check under Global Settings – Language encoding – set it to Western Latin 1) Please join mehere!

  25. Loadingyour corpus into the program • Guide: • Press AntConc 3.4.4 (sprogudgave skal være Latin 1) • Load your folder (Windows explorermethod) • Thenyouareready to search.

  26. AntConcopened – but empty AntConc is now open. Press File Open Directory 3) Load your corpus folder in the Windows way

  27. Your Corpus list

  28. Press Word List

  29. Lantus – what is that? Onlythreesources: -> Product name? Check in texts.

  30. Scrolldown -> differentsourcesuse the word => ESP word What is hypoglycemia If youpresssome of the words, yougetdirectlyinto the texts.

  31. The or trick Finding: Content knowledge Alternatives Synonyms (tryalso Aka Referred to Known as (

  32. The ( trick – findingparenthetical info Which kind of info wouldyou find in a parenthesis and whatdoesthat data tellyou?

  33. Findingcollocations - right Set the sorting parameters at the bottom. 1R 2R 3 R Findings: Metabolicchanges Metaboliccontrol Metabolicdecompensation Metabolicdeterioration All of themconcepts in theirown right.

  34. Findingcollocations - left Set the sorting parameters at the bottom. 1L 2L 3L Findings: Good metabolic Poormetabolic Rapid metabolic … ..but for instance not bad metabolic!

  35. Statistics & collocation – for the nerds Shows ranking, frequencyleft, frequency right, statistics and the collocate. Note for instanceregimen, canbeused with dosing a L and R collocate. The same meaning?

  36. File View – for bettercontext

  37. Concordance plot for precautions Shows in whichtexts a wordexists and how the worduse is distributedthroughout the text. Precautions has a strong front tendency in the texts. A possible genre tool?

  38. Let yourphantasyloose. What do you find out? Whatcan it beused for? Is it useful in the firstplace? Yourturn

  39. The program is intuitive in use – I never learned it from anyone. General Microsoft processeswork in AntConc. Good as an ad-hoc writing and translation tool. Good for register work. Good for collocations. Good for proof of whatyouaredoing. Forgetaboutyour intuitive ideas and check how it works. Define a problem and devise yourown solution method. Even 10 texts as in our case can provide a wealth of information. Ten times faster and 100 times more reliablethantiresome open-and-read x number of Google docs. Must nowadaysnecessarilyreplaceanyoldfashioned pencil-and-paperwork. My opinion – Good

  40. The quality of the findingsdepends on the user Quality in => quality out You have got to getstarted to like it You have to learnapprox. fiveshortcuts in order not to tire out My opinion - limitations

  41. Thankyou for joiningme in this. I wishyou the best of luck with the rest of the conference. If youwant to contactme, please writeme: bmo@dac.au.dk / bmo@expo-com.dk

More Related