600 likes | 789 Views
Resources for Using Corpus Linguistics in ELT. Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan. I. Presentation A. Corpus linguistics and corpus-related resources B. Online resources for corpus linguistics 1. Types of resources
E N D
Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women’s College Kyoto, Japan
I. Presentation • A. Corpus linguistics and corpus-related resources • B. Online resources for corpus linguistics • 1. Types of resources • 2. Examples of resources • C. Using corpus-related resources for language teaching
II. Application • A. Assigned tasks • B. Free exploration
Presentation • Definitions • Corpus (Latin for “body”) • A text or collection of texts • Now generally used to refer to machine-readable texts
Corpus linguistics • the use of the empirical data from a corpus to study language usage and to find patterns of language usage by analyzing actual language use
Requirements • A corpus • Can be a single text or a large collection of texts • Larger corpora provide more reliable results, if the purpose is making generalizations about language use
Balanced corpora • A variety of genres, including academic writing, newspapers, fiction, and spoken language
Specialized corpora • Examples • Academic writing • Texts by learners of English, sometimes with a specific native language • Teachers can develop their own corpora • Newspaper articles • Learners’ texts
Corpus analysis tool(s) • Types • Tools with specific corpora • Tools that can be used with any text or collection of texts • General • Word, Excel, etc. • Specialized • Count words • Find example of specific words or parts of speech • Analyze word frequencies • Evaluate readability
Online Corpora • Free to all users • Available for a fee or for purchase • Available only to restricted users • In this presentation, we will only introduce resources that are free.
Using Corpus Linguistics for Language Teaching • Technology has become widespread and accessible • Larger, more powerful computers that can analyze large amounts of data quickly are available • Many corpus-related resources have become available • Language teachers and learners can use corpora
Corpus-related Internet resources • 1. General resources on corpus linguistics • 2. Vocabulary frequency lists and frequency level checkers • 3. Online corpora, concordancers and other text-analysis software • 4. E-texts • 5. Information about using corpus linguistics for language teaching
Resources for Corpus Linguistics http://www.cis.doshisha.ac.jp/kkitao/library/resource/corpus/corpus.htm
1. General resources on corpus linguistics • Web sites that help orient users to corpora and to what is available online for teachers to use in the classroom or in preparing material
The Compleat Lexical Tutor • http://www.lextutor.ca/ • Resources for data-driven learning, including concordancers for various corpora and in which one can enter texts • Tutorials, resources of teachers, resources for research
Bookmarks for Corpus Linguists • http://devoted.to/corpora/ • extensive annotated list of links related to corpus linguistics, including • software • tools • frequency lists • papers and articles • English and non-English corpora
2. Vocabulary frequency lists, frequency level checkers, and n-gram extractors • Frequency lists • Words used most frequently in English and thus words that are most useful for students to know • Often divided into sublists
Specialized word lists • Academic Word List • http://www.nottingham.ac.uk/~alzsh3/acvocab/index.htm • List includes 570 headwords with their word families • Site includes an explanation of the word lists, the words in each sublist, suggestions for using the list, and a gapmaker that can be used to produce gap-filling exercises
5000 Vocabulary List for Visiting Scholars in the USA • http://www.paulnoll.com/Books/5000-Words/index.html • This is a list of the 5000 Words determined by the Chinese Academy of Sciences for scholars that need to go abroad for research or advanced studies in the USA. They are listed in alphabetical order and have sample sentences and examples. There is an additional three thousand words.
Frequency-level checkers • Produces a list of words at each level of difficulty • Helps a teacher understand how difficult the vocabulary in the reading passage is and which words students at different levels of proficiency might need to learn • N-gram finders • Finds groups of n-words
JACET 8000 Word List • http://www01.tcp-ip.or.jp/~shin/j8web/j8web.cgi • On this web page, you can enter a text and get a list of the words that appear in the text at each of the eight levels of the JACET list. You also get statistics about what percentage of the words (both types and tokens) occur at each of the eight levels.
N-gram finders • Online text analysis tool • http://www.online-utility.org/text/analyzer.jsp • Finds most frequent groups of 2 and 3 words, plus produces a list of all the words, their occurances, and their percentage
Advanced Search – Explore N-grams from the BNC • http://pie.usna.edu/explore.html • Produces lists of n-grams, based on the number of words and occurances you specify • N-gram phrase extractor • http://www.er.uqam.ca/nobel/r21270/cgi-bin/tuples/u_extract.html • Produces KWIC list of n-grams
3. Online corpora, concordancers, and other text-analysis software • Concordancers • A type of software for searching corpora • Produces a list of key words in context (KWIC), that is, search terms with the words that come before and after them. • May be able to search for parts of speech, e.g., take, followed by a preposition • May be able to search for two words that are not next to each other
Corpora (or parts of corpora) may have spoken language, written language, American English, British English, academic English, and so on. • Specialized corpora include: • parallel corpora, which have same texts in different languages (to compare same passages in different languages) • learner corpora, which have students’ writing/ speaking (to help identify learners’ problems or to study characteristics of their writing)
Examples of concordancers • Turbo Lingo • http://www.staff.amu.edu.pl/~sipkadan/lingo.htm • Can enter a text or URL and get a list of KWIC, average sentence length, word frequency list, and other analyses
VIEW (Variation in English Words and Phrases) • http://view.byu.edu/ • Concordancing tool for the British National Corpus, the Corpus of Contemporary American English, and a Time magazine corpus, plus non-English corpora
A powerful concordancing tool • Has a useful tutorial • Click on what you want to do to see samples of searches • For example, if you want to learn to use wildcards, click on that word, and you will see several examples. You choose the type of search you want to do, and the search is automatically filled in. You can revise it based on what you want to do.
Types of searches • Search by exact word, exact phrase, wildcard, or part of speech • For example, mysterious • Use ? or * as a wildcard • For example, * point * • Search for an exact word plus a part of speech • For example, white [n*]
Compare usage of semantically related words • {sheer/total} [n*] • Search for surrounding words • Nouns that follow the verb “wrap” • Limit the search to one register • Adjectives in tabloid newspapers
Compare usage between registers, e.g., news and speaking • we [verb] that: ACAD vs SPOKEN • Find words with similar, more general, or more specific meanings • Similar words to “small” • More general than “shriek” • More specific than “woman”
BNCweb • To log in, go to: • http://bncweb.lancs.ac.uk/bncwebSignup/ • For information, go to: • http://bncweb.info
On BNCweb, you can do simple searches, you can restrict your search to written or spoken texts or based on the type of text. • Form your own subcorpora.
Make frequency lists based on criteria you specify • For example, make a frequency list of all adverbs that end in –ly in spoken texts. • Look at your query history and save queries to use again.
See your results in a sentence view or a KWIC view. • Get a list of collocates, with statistics about their frequency. • Get information about what type of texts the search term was found in.
Online concordancer • http://www.lextutor.ca/concordancers/concord_e.html • Can search a variety of corpora, including the Brown Corpus, the British National Corpus (written and spoken), a learner corpus, etc. • Produces a KWIC list for a given word and a list of collocates and their frequency
WebCorp • http://www.webcorp.org.uk/ • Uses the Internet as a corpus and produces KWIC as well as providing other information
Comparing two texts • Text Lex Compare • http://www.lextutor.ca/text_lex_compare/ • Allows users to enter two texts and get lists of: • Unique words to first text • Shared words in two texts • Unique words in second text • Useful to help teacher find new words in new text
Specialized corpora (a few examples) • Spoken English • Corpus swb (American English telephone conversations) • http://www.ldc.upenn.edu/cgi-bin/lol/swb/speechcorpus?&corpus=swb • Technical English • e-Xplore Technical English • https://learn.sz.htwk-leipzig.de/wc/main.php
Parallel corpora • CRATER Multilingual Aligned Annotated Corpus • http://www.comp.lancs.ac.uk/linguistics/crater/corpus.html • Academic English • Michigan Corpus of American Spoken English • http://quod.lib.umich.edu/m/micase/ • Some large corpora also have sub-corpora of academic English
Online software to assess readability • Tests of document readability and suggestions how to improve readability • http://www.online-utility.org/english/readability_test_and_improve.jsp • Can calculate texts of any length (some online text analysis programs have limits)
Can enter the text directly or enter a URL • e.g., http://www.cis.doshisha.ac.jp/kkitao/Japan/shimoda/s1.htm • Provides statistics: • Number of characters • Number of words • Number of sentences • Number of syllables/word • Number of words/sentence
Calculates readability indexes, including • Gunning Fog Index • Coleman-Liau Index • Flesch Kinkaid Grade Level • Flesch Reading Ease • Lists sentences that might be rewritten to improve readability.
4. E-texts • In some cases, teachers or students may want to develop their own corpora. There are large numbers of e-text available. • Project Gutenberg • http://www.gutenberg.org/wiki/Main_Page • Large collection of downloadable fiction and non-fiction
Internet Public Library: Online Texts • http://www.ipl.org/div/subject/browse/hum60.60.00/ • A large number of online texts on a wide variety of subjects • Drew’s Script-o-Rama • http://www.script-o-rama.com/oldindex.shtml • A website with a large number of scripts of movies and TV programs • American Rhetoric Online Speech Bank • http://www.americanrhetoric.com/speechbank.htm • A website with a large collection of speeches
5. Information about using corpus linguistics for language teaching • Corpus-related websites specifically for language teachers • Learner corpora and SLA Research • http://leo.meikai.ac.jp/%7Etono/ • Links to learner corpora made up of language produced by speakers of various languages, links to useful tools, a bibliobraphy, and so on
Corpus linguistics: What it is and how it can be applied to teaching • http://iteslj.org/Articles/Krieger-Corpus.html • An article about corpus linguistics and how it can be used in the language classroom
Classroom Application • Two types of uses of corpus-related resources • “Low contact” uses – teacher uses resources to help in teaching, e.g., to find the difficult words in a reading passage; students do not actually see the corpus • “High contact” uses – students use the corpora themselves to learn about language, e.g., to find out which adjectives collocate with “rain”
“Data-driven learning” is a high contact use of corpus-related resources. • Using corpora to deduce rules of grammar or usage, e.g., to determine if a word’s connotation is positive or negative • Advantages of data-driven learning • Focus on authentic language • Encouragement of students to deduce • Real, exploratory activities rather than drills • A learner-centered activity
Web sites with suggestions for data-driven learning activities • How to use concordances in teaching English: Some suggestions • http://www.nsknet.or.jp/%7Epeterr-s/concordancing/usingconcs.html