Quality Control in Evaluation and Assessment

Quality Control in Evaluation and Assessment J Charles Alderson, Department of Linguistics and Modern English Language, Lancaster University

“Assessment is central to language learning, in order to establish where learners are at present, what level they have achieved, to give learners feedback on their learning, to diagnose their needs for further development, and to enable the planning of curricula, materials and activities.”

Outline • Current practice • Assessment for certification • Tradition one: teacher-centred, school-based • Tradition two: central, quality controlled • Basic parameters • What is needed to ensure parameters are met

Current practice • Quality of important examinations not monitored • No obligation to show that exams are relevant, fair, unbiased, reliable, and measure relevant skills • University degree in a foreign language qualifies one to examine language competence, despite lack of training in language testing • In many circumstances merely being a native speaker qualifies one to assess language competence. • Teachers assess students’ ability without having been trained.

First tradition · Teacher-centred · School/university-based assessment · Teacher develops the questions · Teacher's opinion the only one that counts · Teacher-examiners have no explicit marking criteria · Assumption that by virtue of being a teacher, and having taught the student being examined, teacher- examiner makes reliable and valid judgements · Authority, professionalism, reliability and validity of teacher rarely questioned · Rare for students to fail

Second tradition · Tests externally developed and administered · National or regional agencies responsible for development, following accepted standards · Tests centrally constructed, piloted and revised · Difficulty levels empirically determined · Externally trained assessors · Empirical equating to known standards or levels of proficiency

Basic parameters • Validity • Reliability • Practicality • Authenticity • Washback • Impact • Currency

“Validity in general refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure. A test is said to be valid to the extent that it measures what it is supposed to measure. It follows that the term valid when used to describe a test should usually be accompanied by the preposition for. Any test may then be valid for some purposes, but not for others.”(Henning, 1987)

Rational, empirical, construct Internal and external validity Face, content, construct Concurrent, predictive Construct Validity

How can validity be established? • My parents think the test looks good. • The test measures what I have been taught. • My teachers tell me that the test is communicative and authentic. • If I take the Rigo utca test instead of the FCE, I will get the same result. • I got a good English test result, and I had no difficulty studying in English at university.

How can validity be established? • Does the test look valid to the general public? • Does the test match the curriculum, or its specifications? • Is the test based adequately on a relevant and acceptable theory?

How can validity be established? • Does the test yield results similar to those from a test known to be valid for the same audience and purpose? • Does the test predict a learner’s future achievements? Note: a test that is not reliable cannot, by definition, be valid

How can validity be established? • A test’s items should work well: they should be of suitable difficulty, and good students should get them right, whilst weak students are expected to get them wrong. • All tests should be piloted, and the results analysed to see if the test performed as predicted

Unclear or non-existent theory Lack of specifications Lack of training of item/ test writers Lack of / unclear criteria for marking Lack of piloting/ pre-testing Lack of detailed analysis of items/ tasks Lack of standard setting to CEF Lack of feedback to candidates and teachers Factors affecting validity

Reliability • If I take the test again tomorrow, will I get the same result? • If I take a different version of the test, will I get the same result? • If the test had had different items, would I have got the same result? • Do all markers agree on the mark I got? • If a marker marks my test again tomorrow, will I get the same result?

Reliability • Over time: test – re-test • Over different forms: parallel • Over different samples: homogeneity • Over different markers: inter-rater • Within one rater over time: intra-rater

Factors affecting reliability • Poor administration conditions – noise, lighting, cheating • Lack of information beforehand • Lack of specifications • Lack of marker training • Lack of standardisation • Lack of monitoring

Practicality • Number of tests to be produced • Length of test in time • Cost of test • Cost of training • Cost of monitoring • Difficulty in piloting/ pre-testing • Time to report results

Factors affecting practicality • Awareness of complexity and cost • Time to do the job: ‘quick and dirty’ remains dirty • Funding to support development, monitoring and further development • Recognition of need for training – of testers and of teachers

Authenticity • Genuineness of text • Naturalness of task • Naturalness of learners’ response • Suitability of test for purpose • Match of test to learners’ needs (if known) • Face validity • Expectations of stakeholders and culture

Factors affecting ‘authenticity’ • A test is a test is a test • Availability of resources • Training of test developers/ item writers • Relative importance of reliability over validity • Purpose of test: proficiency versus progress or diagnosis

Washback • Test can have positive or negative effects • Test can affect content of teaching • Test can affect method of teaching • Test can affect attitudes and motivation • Test can affect all teachers and students in same way, or individuals differently • Importance of test will affect washback

Factors affecting washback • Extent to which teachers know nature of test • Extent to which teachers understand rationale of test • Extent to which teachers consider how best to prepare learners for test • Nature of teachers’ beliefs about teaching • Effort teachers are willing to make • Difficulty of test

Impact • Effect of test on society • Effect of test on stakeholders: employers, higher education, parents, politicians • Intended and unintended • Beneficial or detrimental

Factors affecting impact • Extent to which purpose of test is understood and accepted • Currency of test • Face validity of test • Stakes of test • Availability of information • Education of stakeholders re complexity of testing

Currency of test • Extent to which test is valued by stakeholders • Different stakeholders may have different perspectives: university vs employer; parents vs teachers; teachers vs principals? politicians vs professionals?

Factors affecting currency • Consequences of passing or failing – stakes • Extent to which stakeholders take results seriously into consideration • Beliefs about value of tests in general • Extent to which test matches expectations about tests in general or language tests in particular • Difficulty of test • Institution offering the test

General Issues · Teacher-based assessment vs central quality control · Internal vs external assessment · Quality control of exams (and the associated cost) · Piloting and pre-testing · Test analysis and the role of the expert · The existence of test specifications · Guidance and training for test developers and markers

General Issues (continued) • Feedback to candidates • Pass / fail rates • The currency of the old and the new traditions • The relationship with other languages and countries • The standards of the local exams in terms of "Europe"

Constraints on testing · Time – much less than for teaching · Sample – inevitably limited · Resources always limited – money, infrastructure, trained personnel · Assessment culture / tradition · Lack of awareness of problems and solutions

BUT WASHBACK · Testing is too important to be left to the teacher · Testing is too important to be left to the tester · Both are needed, to reflect and influence teaching, validly and reliably.

“Assessment is central to language learning, in order to establish where learners are at present, what level they have achieved, to give learners feedback on their learning, to diagnose their needs for further development, and to enable the planning of curricula, materials and activities.”

Quality Control in Evaluation and Assessment

Quality Control in Evaluation and Assessment

Presentation Transcript

Assessment and Evaluation

Evaluation and Assessment

WATER QUALITY ASSESSMENT AND POLLUTION CONTROL

Assessment and Evaluation

Quality Assurance/Quality Control Plan Evaluation

Image quality assessment and statistical evaluation

QUALITY EVALUATION AND CONTROL 3202

West Virginia Fatality Assessment and Control Evaluation

Assessment and evaluation

WATER QUALITY ASSESSMENT AND POLLUTION CONTROL

Innovations in assessment and evaluation

Innovation in Assessment and Evaluation

Quality Control and Error Assessment

Evaluation and Assessment

Assessment and Evaluation

The Quality Assessment, Monitoring and Evaluation Framework

Assessment and Evaluation

Quality Control in Evaluation and Assessment

WATER QUALITY ASSESSMENT AND POLLUTION CONTROL

Evaluation and Assessment

WATER QUALITY ASSESSMENT AND POLLUTION CONTROL

Quality Control and Impact Evaluation