330 likes | 438 Views
Quality Control in Evaluation and Assessment. J Charles Alderson, Department of Linguistics and Modern English Language, Lancaster University. “Assessment is central to language learning, in order to establish where learners are at present, what level they have achieved,
E N D
Quality Control in Evaluation and Assessment J Charles Alderson, Department of Linguistics and Modern English Language, Lancaster University
“Assessment is central to language learning, in order to establish where learners are at present, what level they have achieved, to give learners feedback on their learning, to diagnose their needs for further development, and to enable the planning of curricula, materials and activities.”
Outline • Current practice • Assessment for certification • Tradition one: teacher-centred, school-based • Tradition two: central, quality controlled • Basic parameters • What is needed to ensure parameters are met
Current practice • Quality of important examinations not monitored • No obligation to show that exams are relevant, fair, unbiased, reliable, and measure relevant skills • University degree in a foreign language qualifies one to examine language competence, despite lack of training in language testing • In many circumstances merely being a native speaker qualifies one to assess language competence. • Teachers assess students’ ability without having been trained.
First tradition · Teacher-centred · School/university-based assessment · Teacher develops the questions · Teacher's opinion the only one that counts · Teacher-examiners have no explicit marking criteria · Assumption that by virtue of being a teacher, and having taught the student being examined, teacher- examiner makes reliable and valid judgements · Authority, professionalism, reliability and validity of teacher rarely questioned · Rare for students to fail
Second tradition · Tests externally developed and administered · National or regional agencies responsible for development, following accepted standards · Tests centrally constructed, piloted and revised · Difficulty levels empirically determined · Externally trained assessors · Empirical equating to known standards or levels of proficiency
Basic parameters • Validity • Reliability • Practicality • Authenticity • Washback • Impact • Currency
“Validity in general refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure. A test is said to be valid to the extent that it measures what it is supposed to measure. It follows that the term valid when used to describe a test should usually be accompanied by the preposition for. Any test may then be valid for some purposes, but not for others.”(Henning, 1987)
Rational, empirical, construct Internal and external validity Face, content, construct Concurrent, predictive Construct Validity
How can validity be established? • My parents think the test looks good. • The test measures what I have been taught. • My teachers tell me that the test is communicative and authentic. • If I take the Rigo utca test instead of the FCE, I will get the same result. • I got a good English test result, and I had no difficulty studying in English at university.
How can validity be established? • Does the test look valid to the general public? • Does the test match the curriculum, or its specifications? • Is the test based adequately on a relevant and acceptable theory?
How can validity be established? • Does the test yield results similar to those from a test known to be valid for the same audience and purpose? • Does the test predict a learner’s future achievements? Note: a test that is not reliable cannot, by definition, be valid
How can validity be established? • A test’s items should work well: they should be of suitable difficulty, and good students should get them right, whilst weak students are expected to get them wrong. • All tests should be piloted, and the results analysed to see if the test performed as predicted
Unclear or non-existent theory Lack of specifications Lack of training of item/ test writers Lack of / unclear criteria for marking Lack of piloting/ pre-testing Lack of detailed analysis of items/ tasks Lack of standard setting to CEF Lack of feedback to candidates and teachers Factors affecting validity
Reliability • If I take the test again tomorrow, will I get the same result? • If I take a different version of the test, will I get the same result? • If the test had had different items, would I have got the same result? • Do all markers agree on the mark I got? • If a marker marks my test again tomorrow, will I get the same result?
Reliability • Over time: test – re-test • Over different forms: parallel • Over different samples: homogeneity • Over different markers: inter-rater • Within one rater over time: intra-rater
Factors affecting reliability • Poor administration conditions – noise, lighting, cheating • Lack of information beforehand • Lack of specifications • Lack of marker training • Lack of standardisation • Lack of monitoring
Practicality • Number of tests to be produced • Length of test in time • Cost of test • Cost of training • Cost of monitoring • Difficulty in piloting/ pre-testing • Time to report results
Factors affecting practicality • Awareness of complexity and cost • Time to do the job: ‘quick and dirty’ remains dirty • Funding to support development, monitoring and further development • Recognition of need for training – of testers and of teachers
Authenticity • Genuineness of text • Naturalness of task • Naturalness of learners’ response • Suitability of test for purpose • Match of test to learners’ needs (if known) • Face validity • Expectations of stakeholders and culture
Factors affecting ‘authenticity’ • A test is a test is a test • Availability of resources • Training of test developers/ item writers • Relative importance of reliability over validity • Purpose of test: proficiency versus progress or diagnosis
Washback • Test can have positive or negative effects • Test can affect content of teaching • Test can affect method of teaching • Test can affect attitudes and motivation • Test can affect all teachers and students in same way, or individuals differently • Importance of test will affect washback
Factors affecting washback • Extent to which teachers know nature of test • Extent to which teachers understand rationale of test • Extent to which teachers consider how best to prepare learners for test • Nature of teachers’ beliefs about teaching • Effort teachers are willing to make • Difficulty of test
Impact • Effect of test on society • Effect of test on stakeholders: employers, higher education, parents, politicians • Intended and unintended • Beneficial or detrimental
Factors affecting impact • Extent to which purpose of test is understood and accepted • Currency of test • Face validity of test • Stakes of test • Availability of information • Education of stakeholders re complexity of testing
Currency of test • Extent to which test is valued by stakeholders • Different stakeholders may have different perspectives: university vs employer; parents vs teachers; teachers vs principals? politicians vs professionals?
Factors affecting currency • Consequences of passing or failing – stakes • Extent to which stakeholders take results seriously into consideration • Beliefs about value of tests in general • Extent to which test matches expectations about tests in general or language tests in particular • Difficulty of test • Institution offering the test
General Issues · Teacher-based assessment vs central quality control · Internal vs external assessment · Quality control of exams (and the associated cost) · Piloting and pre-testing · Test analysis and the role of the expert · The existence of test specifications · Guidance and training for test developers and markers
General Issues (continued) • Feedback to candidates • Pass / fail rates • The currency of the old and the new traditions • The relationship with other languages and countries • The standards of the local exams in terms of "Europe"
Constraints on testing · Time – much less than for teaching · Sample – inevitably limited · Resources always limited – money, infrastructure, trained personnel · Assessment culture / tradition · Lack of awareness of problems and solutions
BUT WASHBACK · Testing is too important to be left to the teacher · Testing is too important to be left to the tester · Both are needed, to reflect and influence teaching, validly and reliably.
“Assessment is central to language learning, in order to establish where learners are at present, what level they have achieved, to give learners feedback on their learning, to diagnose their needs for further development, and to enable the planning of curricula, materials and activities.”