930 likes | 1.21k Views
TESTING FOR LANGUAGE TEACHERS 101. Paul Raymond Doyon (MAT, MA) Dr. Somsak Boonsathorn (PhD) Mae Fah Luang University. Outline. Testing as Problem Solving Kinds of Testing Approaches to Testing Validity and Reliability Achieving Beneficial Backwash Stages of Test Construction
E N D
TESTING FOR LANGUAGE TEACHERS 101 Paul Raymond Doyon (MAT, MA) Dr. Somsak Boonsathorn (PhD) Mae Fah Luang University
Outline • Testing as Problem Solving • Kinds of Testing • Approaches to Testing • Validity and Reliability • Achieving Beneficial Backwash • Stages of Test Construction • Test Techniques for Testing Overall Ability • Testing Writing • Testing Reading • Testing Listening • Testing Grammar and Vocabulary • Test Administration
Testing As Problem Solving! • No Best Test or Technique • A test which proves ideal for one purpose may be useless for another; a technique which works well in one situation can be entire inappropriate in another • We Want Tests that… • Consistently and accurately measure the abilities we want to measure • Have a beneficial effect on teaching • Are practical – economical in terms of time and money
Practicality • Practicality • All tests cost time and money – to prepare, administer, score, and interpret. Time and money are in limited supply! • Our basic challenge is to • Develop tests which • (1) are Valid and Reliable, • (2) have a Beneficial Backwash Effect on teaching, and • (3) are Practical!
Kinds of Testing • Proficiency Tests: Used to test a student’s general ability with the language • Achievement Tests: Used to test how well the students are at achieving the objectives of the course. Most teachers are involved in the preparation and use of these. • Diagnostic Tests: Used to identify students’ strengths and weaknesses. Intended to ascertain what further teaching is necessary. • Placement Tests: Used to place students at the stage of the teaching program most appropriate to their abilities. Typically, they assign students to classes at different levels.
AchievementTests: Progress and Final • Final Achievement Test • Administered at the end of a course of study. • Intended to measure course contents and/or objectives. • Progress Achievement Test • Administered during a course of study. • Measures the progress the students are making towards course objectives.
Final Achievement Tests • Syllabus-Content Approach • Based directly on a detailed course syllabus or on books or other material used. • Obvious Appeal: test contains only what it is thought that the students have encountered – and thus can be considered, at least, a fair test. • Disadvantage: if the syllabus is badly designed, or books and other material are badly chosen, then the results of the test can be very misleading.
Final Achievement Tests • Course-Objective Approach • Based directly on the objectives of the course. • Obvious Appeal: • Compels course designers to be explicit about objectives. • Makes it possible for performance on the test to show just how far students have achieved those objectives. • Puts pressure on those responsible for the syllabus and for the selection of books and materials to ensure that these are consistent with the course objectives.
Final Achievement Tests • Ideally Speaking • Course content will meet the objectives and a test would be hence based on both the content and the objectives! • “If a test is based on the content of a poor or inappropriate course, the students taking it will be misled as to the extent of their achievement and the quality of the course.” Arthur Hughes, Testing for Language Teachers, 1989.
Progress Achievement Tests • Progress Achievement Tests are intended to measure the progress students are making. • Repeatedly administer final achievement tests and the – hopefully – increasing scores will indicate the progress being made. • Establish a series of well-defined short-term objectives on which to test or quiz the students.
Approaches to Testing • Direct vs. Indirect Testing • Discrete Point vs. Integrative • Norm-referenced vs. Criterion-referenced • Objective vs. Subjective Testing
Approaches to TestingDirect vs. Indirect Testing • Direct Testing requires the test taker to perform precisely the skill we wish to measure. For example, if we want to know how well a student writes essays, then we get them to write an essay. • Indirect Testing makes an attempt to measure the sub-skills which underlie the skills in which we are interested.
Approaches to Testing Benefits of Direct Testing • Direct Testing • Is easier to carry out with productive skills of reading and writing • Relatively straightforward to create the conditions we want to test • Assessment and Interpretation of students’ performance is also straightforward • Practice for the test involves practice of the skills we wish to foster – helpful backwash!
Approaches to TestingBenefits and Pitfalls of Indirect Testing • Indirect Testing • Offers possibility of testing a representative sample of a finite number of abilities (e.g. vocabulary, grammatical structures) which underlie a potentially indefinitely large number of manifestations of them. • Danger is in that the mastery of the underlying micro-skills does not always lead to mastery of larger skills from which these emanate.
Approaches to TestingDirect vs. Indirect Testing • Ideally speaking • we should have a combination of both! • which should lead to beneficial backwash in that the teaching would hence focus on both the greater skills and the micro-skills that underlie them.
Approaches to TestingDiscrete Point vs. Integrative Testing • Discrete Point Testing • entails testing one element at a time, element by element. • Could be vocabulary or grammatical structures. • Integrative Testing • Entails having the test taker combine many language elements in the completion of some task. • Could be writing a composition, taking lecture notes, giving directions, etc.
Approaches to TestingNorm-referenced vs. Criterion-referenced Testing • Norm-referenced Testing • Places a student in a percentage category. • Relates one candidate’s performance to that of other candidates. • Seeks a bell-shaped curve in student assessment. • Criterion-referenced Testing • Test what students can actually do with the language. • Hence, it is possible for all students to get As if they are all able to meet the criteria. • Motivates Students to perform “up-to-standard” rather than trying to be “better” than other students.
Approaches to TestingSubjective Testing vs. Objective Testing • Subjective Testing • Judgement is required on the part of the scorer. • Different degrees of Subjectivity in Scoring. • Complexity increases subjectivity – the scoring of a composition being more subjective compared to short-answer responses. • Objective Testing • No Judgement is required on the part of the scorer. • Multiple Choice, Fill-in-the-blank
Validity and Reliability • Validity: a test is said to be valid if it measures accurately the abilities it is intended to measure • Reliability: a test is said to be reliable if it provides consistent results no matter how many times the students take it
Validity and Reliability Validity: Four Factors • Content Validity: • Content is representative of all the language skills, structures, vocabulary, etc. with which it is intended to test. • Criterion-related Validity: • Where the results of a shorter test – given for practical reasons – corresponds to the results obtained from a longer more complete test. • Construct Validity: • The test measures exactly the ability it is intended to measure. Construct refers to an underlying trait or ability hypothesized in language learning theory. Becomes an important consideration in indirect testing of abilities or the testing of sub-abilities like guessing the meaning of unknown words. • Face Validity: • An examination has face validity if it seems as if it is measuring what it is supposed to be measuring.
Validity and Reliability Reliability: Two Components • Test Reliability: • That a score on a test will be approximately the same no matter how many times a student takes it. • Scorer Reliability: • When the test is objective, the scoring requires no judgment, and the scores should always be the same. • When the test is subjective, the scoring requires judgment, and the scores will not be the same.
How to Make Tests More Reliable! • Test for enough independent samples of behavior and allow for as many fresh starts as possible • Do not allow test takers too much freedom. Restrict and specify their range of possible answers. • Write unambiguous items • Provide clear and explicit instructions • Ensure that tests are well laid out and perfectly legible • Make sure candidates are familiar with format and test-taking procedures • Provide uniform and non-distracting conditions of administration • Use items that permit scoring which is objective as possible • Make comparisons between candidates as direct as possible • Provide a detailed scoring key • Train scorers • Agree on acceptable responses and appropriate scores at the outset of scoring • Identify test takers by number, not name • Employ multiple, independent scoring
Achieving Beneficial Backwash / Washback • Test abilities whose development we want fostered • Sample widely and unpredictably • Use both direct and indirect testing • Make testing criterion-referenced • Base achievement tests on objectives • Ensure test is known and understood by both teachers and students • Provide assistance to teachers
Achieving Beneficial Backwash / Washback • Test abilities whose development we want fostered • For example, if we want to develop “Communicative Competence” than we need to test aspects of Communicative Competence. • Don’t just test what is easiest to test. • Certain abilities should be given sufficient “weight” in relation to other abilities.
Achieving Beneficial Backwash / Washback • Sample widely and unpredictably • Tests can normally only measure a sample of the language. Therefore the sample taken should represent as much as possible the full scope of what is specified. • For example, if the TOEFL writing test were to only test (1) compare and contrast, and (2) problem and solution, then much preparation would be limited to only these two types of tasks while others would be ignored.
Achieving Beneficial Backwash / Washback • Use both direct and indirect testing • Test the larger skills directly • Test the micro-skills (making up those larger skills) indirectly
Achieving Beneficial Backwash / Washback • Make testing criterion-referenced • If students know what they have to do and to what degree to succeed, they will have a clear picture of what they need to do in order to achieve. • They will know that if they perform the tasks at the criterion level, then they will be successful on the test, regardless of how the other students perform. • Both of the above are motivating for the students. • Also possible to have a series of Criterion-referenced tests, each representing a different level of proficiency. Students must complete the majority of tasks successfully in order to “pass” the test and move onto the next level of proficiency.
Achieving Beneficial Backwash / Washback • Base achievement tests on objectives • Will provide truer picture of what has actually been achieved
Achieving Beneficial Backwash / Washback • Ensure test is known and understood by students and teachers • Teachers and students should understand what the test demands. • The test’s rationale, its specifications, and sample items should be made available to everyone concerned with the preparation for the test. • Increases test reliability.
Achieving Beneficial Backwash / Washback • Provide assistance to teachers • The introduction of a new test can make new demands on teachers • If a long-standing test on grammatical structure and vocabulary is to be replaced with a test of a much more communicative nature, it is possible that many teachers may feel that they do not know how to teach communicative skills. Of course, the reason the communicative test may have been introduced in the first place was to encourage communicative language teaching. Hence, the teachers will also need guidance and training in how (and why) to do this. If these are not given, the test will not achieve its desired effect and will more likely result in chaos and disaffection.
Stages of Test Construction • Statement of the Problem • Providing a Solution to the Problem • Writing Specifications for the Test • Writing the Test • Pretesting
Stages of Test Construction Statement of the Problem • Statement of the Problem • Be clear about what one wants to know and why! • What kind of test is most appropriate? • What is the precise purpose? • What abilities are to be tested? • How detailed must the results be? • How accurate must the results be? • How important is backwash? • What are the constraints (unavailability of expertise, facilities, time [for construction, administration, and scoring])?
Stages of Test Construction Providing a Solution to the Problem • Providing a Solution to the Problem • Once the problem is clear, then steps can be taken to solve it. • Efforts should be made to gather information on similar tests designed for similar situations. If possible, samples should be obtained. Should not be copied, but rather used to suggest possibilities, since there is no need to “reinvent the wheel.”
Stages of Test Construction Writing Specifications for the Test • Writing Specifications for the Test • Content • Operations • Types of Text • Addressees • Topics • Format and Timing • Criterial Levels of Performance • Scoring Procedures
Stages of Test Construction Writing Specifications for the Test • Content • Refers not to the content of a single, particular version of the test, but to the entire potential content of any number of versions. • Samples of this content should appear in individual versions of the test. • The fuller the information on content available, the less arbitrary should the decisions be as to what should appear on any version of the test.
Stages of Test Construction Writing Specifications for the Test • Content • The content will vary depending on the type of test. A grammar test (e.g. structures) will be different than one that tests communicative functions (e.g. ordering in a restaurant or asking for directions). • Some things to consider: • Operations: tasks students will have to be able to carry out (e.g. in reading, skimming and scanning, etc.). • Types of Text: (e.g. in writing, letters, forms, academic essays, etc.). • Addressees: the people the test-taker is expected to be able to speak or write to; or the people for whom reading and listening are primarily intended (for example, native-speaker university students). • Topics: topics should be selected according to their suitability for the test takers and the type of test.
Stages of Test Construction Writing Specifications for the Test • Format and Timing • Should specify test structure and item types/elicitation procedures, with examples. • Should state how much weight in scoring will be allocated to each component.
Stages of Test Construction Writing Specifications for the Test • Criterial Levels of Performance • The required levels of performance for different levels of success should be specified. For example, to demonstrate mastery, 80 % of the items must be responded to correctly. • It may entail a complex rubric including the following: accuracy, appropriacy, range of expression, flexibility, size of utterances.
Stages of Test Construction Writing Specifications for the Test • Scoring Procedures • Most relevant when scoring is subjective. • Test constructors should be clear as to how they will achieve high scorer reliability.
Stages of Test Construction Writing the Test • Sampling • Choose widely from whole area of content. • Succeeding versions of test should sample widely and unpredictably.
Stages of Test Construction Writing the Test • Item Writing and Moderation • Writing of successful items is difficult. • Some items will have to be rejected – others reworked. • Best way is through teamwork! • Item writers must be open to, and ready to accept criticism. • Critical questions: • Is the task perfectly clear? • Is there more than one possible correct answer? • Do test takers have enough time to perform the tasks?
Stages of Test Construction Writing the Test • Writing and Moderation of Scoring Key • When there is only one correct response, this is quite straightforward. • When there are alternative acceptable responses, which may be awarded different scores, or where partial credit may be given for incomplete responses, greater care should be given.
Stages of Test Construction Pretesting • Pretesting • Even after careful moderation, there may be some problems with the test. • Obviously better if these problems can be identified before the test is administered to the group for which it is intended. • Pretesting is often not feasible. Group may not be available or may put security of test at risk. • Problems that become apparent during administration and scoring should be noted and corrections made for the next time the test is given.
Test Techniques for Testing Overall Ability • Definition: Test Techniques • Means of eliciting behavior from test takers which inform us about their language abilities. • We need test techniques which • elicit valid and reliable behavior regarding ability in which we are interested; • will elicit behavior which will be reliably scored; • are economical; and • have a positive backwash effect.
Test Techniques for Testing Overall AbilityMultiple Choice • Multiple Choice • Advantages • Scoring is reliable and can be done rapidly and economically, • Possible to include many more items than would otherwise be possible in a given period of time – making the test more reliable. • Disadvantages • Tests only recognition knowledge • Guessing may have a considerable but unknowable effect on test scores • Technique severely restricts what can be tested • It is very difficult to write successful items • Backwash may be harmful • Cheating may be facilititated.
Test Techniques for Testing Overall AbilityMultiple Choice • Multiple Choice • Hence, it is • Best suited for relatively infrequent testing of large numbers of individuals, • Should be limited in institutional testing to particular tasks which lend themselves very well to the multiple choice format (e.g. reading or listening comprehension). • Institutions should avoid excessive, indiscriminate, and potentially harmful use of the technique.
Test Techniques for Testing Overall Ability Cloze (Fill in the Blanks) • Cloze • A cloze test is essentially a fill-in-the-blank test. However, initially, after a lead-in every seventh word or so was deleted and the test taker was asked to attempt to replace the original words. • A better and more reliable method is to carefully choose which words to delete from a passage. • Can be used with a tape-recorded oral passage to indirectly test oral ability.
Test Techniques for Testing Overall Ability Cloze (Fill in the Blanks) • Advice for Cloze Tests • Passages should be at the appropriate level. • Should be of the appropriate style of text. • Deletions should be made every 8th to 10th word after a few sentences of uninterrupted text. • Passage should be tried out on native speakers and range of acceptable answers determined. • Clear instructions should be provided and students should initially be encouraged to read through the passage first. • The layout should facilitate scoring. • Test takers should have had an opportunity to become familiar with this technique beforehand.
Test Techniques for Testing Overall Ability The C-Test • A variety of the C-Test • Instead of whole words it is the second half of every word that is deleted. • Advantages over the cloze test are • Only exact scoring is necessary • Shorter (and so more) passages are possible • A wider range of topics, styles, and levels of ability is possible. • In comparison to a Cloze, a C-Test of 100 items takes little space and not nearly so much time to complete (since candidates do not have to read so much text).
Test Techniques for Testing Overall Ability The C-Test • Disadvantage • Puzzle-like nature • May end up rather testing one’s ability to figure out puzzles than in testing language ability. • However, • Research seems to indicate that it gives a rough estimate of overall language ability.