530 likes | 2.14k Views
Principles of Language Assessment. Questions to check whether or not the test is well designed: How do you know if a test is effective? Can it be given within appropriate administrative constraints? Is it dependable? Does it accurately measure what we want to measure?. PRATICALITY
E N D
Principles of Language Assessment Questions to check whether or not the test is well designed: How do you know if a test is effective? Can it be given within appropriate administrative constraints? Is it dependable? Does it accurately measure what we want to measure? PRATICALITY RELIABILITY VALIDITY AUTHENTICITY WASHBACK
PRACTICALITY An effective test is practical: • It is not excessively expensive. • It stays within appropriate time constraint. • It is relatively easy to administer. • It has a scoring/evaluation procedure that is specific and time-efficient.
RELIABILITY • A reliable test is consistent and dependable. • If the test is given to the same student or matched students on two different occasions, the test should yield the similar results. • Factors contribute to the unreliability of a test: • Student-related reliability – temporary illness, fatigues, a “bad day”, anxiety, and others. • Rater Reliability – human error, subjectivity, bias in scoring process. • Inter-rater reliability – two or more scorers yield inconsistent scores of the same test, due to lack of attention to scoring criteria, inexperience, inattention, and preconceived biases. • Intra-rater reliability – a scorer yield inconsistent scores of the same tests done by different test-takers. • Unreliability may also result the conditions in which the test is administered, e.g., tape-recorder, and noises when in listening test.
Validity • Validity is the extent to which inferences made from assessment results are appropriate, meaningful, and useful in terms of the purpose of the assessment. • A valid test of reading ability actually measure reading ability –previous knowledge in a subject nor other variable of questionable relevance. • Types of validity in tests: • Content validity • Criterion validity • Construct validity • Consequential validity • Face validity
Content Validity (1) • If a test actually samples the subject matter about which conclusions are to be drawn, and if it requires the test taker to perform the behavior that is being measured, it can claim content-related evidence of validity, often popularity referred to content validity • A test is considered valid, if the tester can clearly define the achievement measured. • The test of speaking ability does not achieve the content validity if the test asks the test takers to answer paper-and pencil multiple-choice questions regarding grammatical judgment.
Content Validity (2) • Another way of understanding content validity is to consider the difference between direct and indirect testing. • Direct testing - the test-taker is actually performing the target task. • Indirect testing – the test-takers are not performing the task itself but rather than a task that is related in some way. • The most feasible rule of thumb for achieving content validity in classroom assessment is to test performance directly.
Criterion Validity • In the case of teacher-made classroom assessment, criterion-referenced evidence is best demonstrated through a comparison of results of an assessment with result of some other measure of the same criterion. • Criterion evidence usually falls into one of two categories: concurrent and predictive validity. • Concurrent validity – the results are supported by other concurrent performance beyond the assessment itself. • Predictive validity – in the case of placement tests, admissions assessment batteries, and the like.
Construct Validity • A construct is any theory, hypothesis, or model that attempts to explain observed phenomena in our universe of perception. • Constructs may or may not be directly or empirically measured – the verification often requires inferential data. • Does this test actually tap into the theoretical construct as it has been designed. • E.g., proficiency and communicative competence are linguistic construct. • Tests are operationally definitions of constructs in that they operationalize the entity that is being measured • Construct validity is a major issue in validating large-scale standardized test of proficiency.
Consequential Validity • Consequential validity encompasses all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its impact on the learners, and social consequence of s test’s interpretation and use • As high-stakes assessment has gained ground in the last two decades, one aspect of consequential validity has drawn special attention: the effect of test preparation courses and manual on performance.
Face Validity (1) • Face validity referred to degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure. • It is based on the subjective judgment of the examinees who take it, the administrative personnel who decide on its use, and other psychometrical unsophisticated observers. • Face validity asks the question, “ Does the test, on the ‘face’ of it, appear from the learner’s perspective to test what it is designed to test?”
Face Validity (2) • Face validity will likely be high if learners encounter: • A well-constructed, expected format with familiar task • A test that is clearly doable within the allotted time limit • Items that are clear and uncomplicated • Directions that are crystal clear • Tasks that relate to their course work (content validity) • A difficult level that presents a reasonable challenge. • Face validity is purely a factor of the “eye of the beholder” – how the test-takers, or possibly the test giver, intuitively perceives the instrument.