1 / 36

ESP Materials Derived from a Web-based Corpus

Pisamai Supatranont, Ph.D. Rajamangala University of Technology Lanna Tak, Thailand supatranont@yahoo.com. TESL Ontario’s 36th Annual Conference. “Celebrating the International Year of Languages” Friday, November 14, 2008, 8.30 – 9.20 am (FAE) Sheraton Centre Toronto, Canada. ESP Materials

emery
Download Presentation

ESP Materials Derived from a Web-based Corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pisamai Supatranont, Ph.D. Rajamangala University of Technology Lanna Tak, Thailand supatranont@yahoo.com TESL Ontario’s 36th Annual Conference “Celebrating the International Year of Languages”Friday, November 14, 2008, 8.30 – 9.20 am (FAE) Sheraton Centre Toronto, Canada ESP Materials Derived from a Web-based Corpus

  2. Presentation Outline • Background and rationale • Research questions • Research methodology • Data analysis and findings • Discussion

  3. Funded by • Conducted in February – July 2008 at The researcher is from RMUTL Tak, Thailand Background of the Study The study is: • Under supervision of Assoc. Prof. David Hall • With consultation of Prof. Pam Peters

  4. Cause Influence of Information and Communication Technology (ICT) in academic and professional settings Effect To get good jobs, university students both in ICT and non-ICT need English to communicate in ICT working environment. Rationale of the Study

  5. ESP Materials Development • 1. Limitation of relevant ESP textbooks • Although specialized texts in ICT are abundant, they are not suitable for unmodified and unsupported use directly in ESP classes because of their difficulty for EFL students. • Need for teacher-designed materials in ESP teaching.

  6. ESP Materials Development • 2. Difference of students’ background knowledge • ICT students: • posses some specialized knowledge and skills to design hardware and software. • need English to communicate their knowledge in academic and professional contexts. • Non-ICT students: • have little knowledge of ICT • need ICT knowledge as computer users. • need to learn both basic ICT concepts and English • to communicate in business companies or organizations. • Different learning needs = same level of English = different level of specialized knowledge • Need for different specialized contents to facilitate ESP learning

  7. ESP Materials Development • 3. Insufficiency of EFL students’ lexical knowledge • It was found that undergraduate students in EFL countries e.g. in Thailand (Supatranont, 2005), Oman (Cobb and Horst, 2001), and Indonesia (Nurweni and Read, 1999) have limited lexical knowledge and less proficient in English than what is expected for students at a university level. • In Supatranont’s study (2005), lexical knowledge of RMUTL students was found below the lexical threshold to academic study. With limited vocabulary size of academic words, students cannot cope well with the specialized texts because most frequent words in these texts consist of academic and sub-technical words (Mundraya, 2006). • Academic and technical words should be integrated as main vocabulary components of language input.

  8. To read academic texts comprehensibly, 95% coverage of words known in that text is the minimum point (Laufer, 1988). Knowledge of these two wordlists is estimated to provide over 90% coverage of academic texts in all disciplines. ESP Materials Development • Lexical threshold to academic study is composed of two wordlists: (Nation, 2001; Coxhead & Nation, 2001; Cobb & Horst, 2001; and Nation & Waring, 1997) General service list (GSL) = 2,000 high frequency words(West, 1953) (and)Academic word list (AWL) = 570 academic words(Coxhead, 1998) • Academic vocabulary in this study is based on the GSL and AWL (downloaded from http://www.uefap.com/vocab/vocfram.htm)

  9. Objectives of the Study 1. To identify high-frequency language items in ICT specialized texts by focusing on lexical areas: • academic words: based on GSL and AWL • technical words: words with particular meaning in ICT • technical collocations: noun phrases with particular meaning in ICT 2. To obtain a set of language input to design a course material for teaching English for ICT to non-ICT EFL students by using a corpus-based analysis method.

  10. 1 What are high-frequency academic words in ICT specialized texts? What are high-frequency technical words in ICT specialized texts? 2 What are high-frequency technical collocations in ICT specialized texts? 3 Research Questions

  11. Text selection Corpus Compilation Corpus-based analysis The methodology is divided into three main steps Corpus compilation Study a corpus with Text-analysis software Text Selection Research Methodology

  12. Research Methodology Text Selection • Texts selected exclusively from web-based tutorials in ICT • Authors: mostly lecturers in universities and tutorial centers. • 5 topics concerning fundamental ICT knowledge: • Computer hardware • Operating systems and graphical user interfaces (OS and GUIs) • Basic application software • Multimedia software • Internet software • 3 text types: articles, manuals and advertisements (of hardware)

  13. Number of Text Selection Research Methodology Total files = 230

  14. Number of words Research Methodology 1500-2000 w/article 700-1000 w/manual 200-500 w/ad Total words = 287,478

  15. Design of the EICT Corpus Research Methodology

  16. Text-analysis Software: WordSmith Tools Research Methodology • WordSmith Tools version 5.0 • Developed by Mike Scott (2007) • University of Liverpool, UK • www.lexically.net/wordsmith/index.html

  17. Reference Corpus Research Methodology According to Bowker and Pearson (2002), Hunston (2002), and Scott (2001): • To ensure the word’s ‘keyness’, the frequency wordlist of a corpus should be compared with a larger reference corpus. • With Log Likelihood Formula: Unusually frequent or infrequent words can be identified for their ‘keyness’ and the significance difference (p value) i.e.: • Words with positive keyness => occurs unusually more often. • Words with negative keyness => occurs unusually less often.

  18. Reference Corpus: BNC Research Methodology • British National Corpus (BNC) • A general corpus of 100 million words • Samples of written and spoken language from a wide range of sources • BNC website is http://www.natcorp.ox.ac.uk • In the present study, BNC wordlist is from WordSmith Tools

  19. Data Analysis and Findings The method of analysis is adapted from the suggestions of Bowker and Pearson (2002), and Scott (2001). The method and findings are described according to the research questions. 1. What are high-frequency academic wordsin ICT specialized texts? 2. What are high-frequency technical words in ICT specialized texts? 3. What are high-frequency technical collocations in ICT specialized texts?

  20. Data Analysis and Findings Question 1: What are high-frequency academic words in ICT specialized texts? 1.1 Download GSL and AWL wordlists from the website of the University of Hertfordshire, UK at http://www.uefap.com/vocab/vocfram.htm. Use these words as academic word candidates. 1,937 GSL Headwords 570 AWL Headwords

  21. Data Analysis and Findings 1.2 Build a wordlist of the EICT Corpus, resulting totally in 6064 word types. 1.3 Use academic word candidates to mark all GSL and AWL in the corpus. Lemmatize them, resulting in 941 headwords of academic word candidates with ≥ 5 occurrences. Sort in alphabetical order

  22. 1.4 Compare the list of academic word candidates with the list of BNC, using Log Likelihood Formula at the p value 0.000001. • The software is set: • To process with full lemma • To display only words with positive keyness Data Analysis and Findings

  23. Data Analysis and Findings Finding 1 From 941 words, 343words with ≥ 5 occurrences, positive keyness, and significance difference are cropped up as high-frequency academic words. Excluding function words Sort in alphabetical order Sort according to keyness

  24. Data Analysis and Findings Finding 1 It was found that: general words + technical sense in specialized texts. From 343 words in total: 95 words e.g. burn, window, wordetc. convey particular meanings in ICT different from their meanings in general texts. Simple & familiar (but) => students’ confusion when interpreting incorrectly As found in previous studies in related fields of ICT. For example: Lam’s study (in Chen and Ge, 2007) reported computer science students’ confusion when interpreting the word ‘field’ in the agricultural sense rather than as an options in a database program. These words were classified assemi-technical words.

  25. Data Analysis and Findings Finding 1 • All 343 high-frequency academic words were classified into 2 groups. • 248 academic words: • e.g. access, compute, illustrate indicate, identify, manipulate, • term, category, feature, occurrence, symbol etc. • 95 semi-technical words: • 2.1 Words with technical senses or particular meaning • e.g. burn, drive, refresh, card, domain, engine, memory, field • application, character, Word, document, window etc. • 2.2 Words in mathematics, geometric shape and diagram • e.g. add, multiply, divide, axis, table, row, degree etc. • 2.3. Simple words frequently used as command or method • e.g. edit, enable, paste, shift, help, enter, drag, drop etc.

  26. Data Analysis and Findings Question 2: What are high-frequency technical words in ICT specialized texts? Similarly to the method in Question 1: 2.1 Build word frequency list of the whole EICT Corpus. 2.2 Exclude all function words and academic words in finding 1. 2.3 Lemmatize the remaining words, resulting in 938 headwords. 2.4 Keep only words with ≥ 5 occurrences and technical meanings. 2.5 Compare the resulting wordlist with BNC wordlist, using Log Likelihood at the p value 0.000001.

  27. Data Analysis and Findings Finding 2 From 938 words, 358words/acronymswith ≥ 5 occurrences, positive keyness, and significance difference are selected to be high-frequency technical words. Sort according to keyness

  28. Data Analysis and Findings Finding 2 • All 358 resulting words are classified into 5 groups: • 1. 106 words withparticular meanings(different from general meaning) • e.g. cache, cookies, bus, port, bitmap, chip, cursor, pixel etc. • 2. 87 words referring to basic program, devices, command, keys • e.g. spreadsheet, database, notepad, wizard, backspace etc. • 3. 55 abbreviations, acronyms, and extensions • e.g. ASCII, WYSIWYG, ALU, ROM, RAM, OS, RGB, ESC, ALT • txt, doc, gif, wav, http, html, www etc. • 4. 17 words in mathematics, geometric shapes and diagram • e.g. equation, ellipse, polygon, cell, column, intersection etc. • 92 sub-technical terms andfrequent wordsin ICT • e.g. alignment, compression, directory, multimedia, playlist etc.

  29. 3.1 Set the software: • To produce concordances. • To display 2-5 word clusters with ≥ 5 co-occurrences • To compute the strength of relation between words, using Mutual Information(MI) ≥ 5.000 Data Analysis and Findings Question 3: What are high-frequency technical collocations in ICT specialized texts?

  30. Data Analysis and Findings 3.2 On the cluster tab, select only the 2-5 clusters with technical meaning and frequent uses.

  31. Data Analysis and Findings 3.3 Compute the relation value, on the collocate tab. Sort according to the relation value

  32. Data Analysis and Findings Finding 3 3.4 Select only the collocations with ≥ 5 occurrences, MI scores ≥ 5.000, and distribution in ≥ 3 text files. 335 collocates were selected as technical collocations => noun phrases with technical meanings e.g. mail merge operating system (OS) uniform resource location (URL) hypertext markup language (html) random access memory (RAM) wide area network (WAN) etc.

  33. Discussion • Significance of the study: • Provide an overall idea about language description of English for ICT. • Provide a clear goal of language learning for serving particular learning needs. In materials design, teacher knows which language items should be focused on in designing lessons and which ones are already known by the students. Apart from typical teaching materials, a corpus itself can also be a great source of learning. It makes possible for students’ direct access to the corpus, which can promote data-driven learning.

  34. Bowker, L. and Pearson, J. (2002). Working with Specialized Language: A Practical Guide to Using Corpora. USA and UK: Routledge. Chen, Q., & Ge, G. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles (RAs). English for Specific Purposes, 26, 502-514. Elsevier Science. Cobb, T. and Horst, M. (2001). Reading academic English: Carrying learners across the lexical threshold. In Flowerdew, J. and Peacock, M., (eds.) Research Perspectives on English for Academic Purposes. pp. 315-329. UK: Cambridge University Press. Coxhead, A. (1998). An Academic Word List. ELI occasional publication. No.18. Victory University of Wellington, New Zealand. Coxhead, A. and Nation, P. (2001). The specialized vocabulary of English for academic purposes. In Flowerdew, J. and Peacock, M. (ed.) Research Perspectives on English for Academic Purposes. pp. 252-267. UK: Cambridge University Press. Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Laufer, B. 1989. What percentage of text-lexis is essential for comprehension? Cited in Cobb, T., and Horst, M. Reading academic English: Carrying learners across the lexical threshold. In Flowerdew, J., and Peacock, M., (eds.) Research perspectives on English for academic purposes, pp. 315-329. UK : Cambridge University Press, 2001. References

  35. References Mudraya, O. (2006). Engineering English: A lexical frequency instructional models. English for Specific Purposes. Volume 25 (2) pp.235-256. Elsevier Science. Nation, P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press. Nation, P. and Waring, R. (1997). Vocabulary size, text coverage and word lists. In Schmitt, N. and McCarthy, M. (eds.) Vocabulary: Description, Acquisition and Pedagogy. pp. 6-19. Cambridge: Cambridge University Press. Nurweni, A. and Read, J. (1999). The English vocabulary knowledge of Indonesian university students.English for Specific Purposes. Volume 18 (2) pp. 161 – 175. Elsevier Science. Scott, M. (2001). Comparing corpora and identifying key words, collocations, frequency distributions through the WordSmith Tools suite of computer programs. In Ghadessy, M., Henry, A., and Roseberry, R.L. (2001). Small Corpus Studies and ELT: Theory and Practice. pp. 47-67. US: John Benjamins Publishing. Scott, M. (2007). WordSmith Tools version 5.0. Oxford University Press. Available at http://www.lexically.net/wordsmith/index.html. Supatranont, P. (2005a). Classroom concordancing: Increasing vocabulary size for academic reading. KOTESOL Proceeding 2005. pp. 35-44. South Korea. Supatranont, P. (2005b). A Comparison of the Effects of the Concordance-based and the Conventional Teaching Methods on Engineering Students’ English Vocabulary Learning. Online Ph.D. Dissertation, Program of English as an International Language, Chulalongkorn University, Thailand. Available at http://www.arts.chula.ac.th/~ling/thesis/Pisamai2548.pdf West, M. (1953). A General Service List of English Words. London: Longman, Green and Company.

  36. Thank you for your attention. Any Questions?

More Related