210 likes | 334 Views
Managing the development of robust and reliable assessments for qualifications and learning. 21 March 2012 John Winkley. Introduction. What do we mean by robust and reliable? And what do our stakeholders mean? How do we know if qualifications are robust and reliable?
E N D
Managing the developmentof robust and reliable assessmentsfor qualifications and learning 21 March 2012 John Winkley
Introduction • What do we mean by robust and reliable? And what do our stakeholders mean? How do we know if qualifications are robust and reliable? • Is e-assessment a help or a hindrance to achieving this? • Does this differ for different settings: • Vocational vs academic vs professional qualifications • Diagnostic and formative vs summative assessments • Experience drawn from UK – school (NC tests and GQ), vocational, professional and HE examinations • Also from international projects
Reliable assessments • What do we mean by reliability? • Fairness, in terms of what we control – test specification, test questions, test coverage, test marking and awarding. Repeatability. • Not true/false (although the public perception is commonly that it is, and this has an effect on government and AO tactics). • How do we know? • By being careful about our assessment process in design and implementation • By undertaking analysis of processes and outcomes afterwards. much of it statistical but other methods exist • Reliability is quite hard to prove conclusively, and almost never 100% repeatable
Robust assessments • What do we mean by robust assessments? Rigorous? • I’m taking it to mean “valid”, ie fit for purpose. (there are other aspects of assessment quality too) • Key elements of validity • Does the assessment measure the curriculum properly? • Is the scoring accurate and reliable? (reliability) • Does the scoring match the performance standards • Is it a good predictor (predictive validity) • Do people believe in it? (face validity) • Validity is a slippery term, not least because assessments in the UK often have multiple purposes
Robust assessments (2) Fitness for Purpose Consider Psychology A Level Paul Newton, (then QCDA now Cambridge Assessment) http://www.publications.parliament.uk/pa/cm200708/cmselect/cmchilsch/169/16906.htm#n35
Who else cares? • Most stakeholders • Teachers, parents, students, the press (they just use different words for reliability and validity) • Ofqual and Government • Make it clear that both reliability and validity are essentially non-negotiable • Are very interested in requiring AOs to report reliability measurements for qualifications, and have published detailed research on reliability.www.ofqual.gov.uk/standards/reliability (Mike Cresswell and John Winkley) • We’ve set our stall out on validity (in contrast to other approaches)
Where are we with e-assessment? Landscape divided into four domains
Benefits of e-assessment • The main benefits of e-assessment differ between summative and formative applications: • For e-assessment in on-screen computer-marked testing • Speed of feedback • Increased flexibility and efficiency of assessment • More discriminating assessments • Richness and authenticity of the assessment experience • Environmental benefits compared to the costs of paper-based exams system • (the benefits vary a little for different qualification types and purposes) • For “Wider e-assessment” • Richness and authenticity of the assessment experience • Technology-enabled assessment facilitates and improves the effectiveness and/or efficiency of communication between learners and tutors. • Candidates generally like e-assessment • Becta e-Assessment Landscape Study • http://www.alphaplusconsultancy.co.uk/pdf/Becta%20-%20E-assessment%20Landscape%20review%20-%20Report%20Final.pdf
Barriers to e-assessment • Capital Cost • Custom and practice, apparent lack of interest from stakeholders, other priorities (change) and risk aversion coupled with a lack of commercial pressure • ICT estate (particularly in schools) coupled with examination format • Concerns about validity and reliability, often unarticulated: • What if it tests IT skills rather than what it’s supposed to be testing? • What if it all breaks down mid-test or won’t start? • What about inter-form comparability? • What about face validity (eg the uncanny valley)? • What about transparency and openness? • What about paper to screen comparability? • What about accessibility? • What about screen size, tired eyes, broken mice, use of colour, etc?
E-Assessment, reliability and validity • Most of the legitimate concerns about validity and reliability have been dealt with, thoroughly, both in practice and research • Cost challenges remain in many settings • The UK (and USA) have held a strong lead in e-assessment deployment • However, the rest of the world is catching up • The UK boasts an outstanding technology supply sector • Viable, ‘reliable’ E-assessment for small and large AOs • Powerful assessment types and test creation technologies • Powerful marking technologies • Excellent research and evaluation • (Although the technology is still developing at the edges) • Today, more than presenting problems for validity and reliability, e-assessment provides some strong approaches to the meeting those challenges
Improving reliability and validity • Example 1 – Simple multiple choice tests (R) • Many vocational and professional qualifications in the UK and internationally • E-assessment is well suited to this scenario and it is the easiest to implement • Relatively few ‘public’ concerns expressed about paper-based approaches despite known issues from research. • Multiple response items easier to handle on-screen than on-paper • Example 2 – Media rich, more complex question types (V) • TDA QTS Tests, Functional Skills, DSA Hazard Perception, Medical assessments • Potential for improved content and face validity (authenticity) • Powerful computer marking available
Improving reliability and validity • Example 3 - On-screen marking (R) GCSE in England • Heavily deployed – hundreds of millions of marks given each year on-screen. Most GCSE marking moved to on-screen. • Management information improvement • Efficiency savings by avoiding movement of paper • New possibilities for monitoring and training of markers • Deal with errors and anomalies more effectively • Example 4 - Item banking (R) Many vocational & professional exams (eg DSA) • Workflow around item creation – management information for QA • Test creation using test balancing rules to support on-demand testing • Monitor some aspects of reliability and fairness automatically • Monitoring exposure and drift, and other security issues • Using results to farm item banks • Target resources on weakest areas • Deal more quickly and effectively with anomalies • Adapt to changing requirements
Improving reliability and validity • Example 5 – Innovation in medium-stakes testing (R+V)SQA Unit assessments project • Supports sharing models for content • Innovative approach to dealing with challenges in unit testing • Validity is “built-in”, and wider quality is improved too • And it is not necessarily expensive • Example 6 – Allowing students to use their own tools (V) • Crossover technology with ePortfolios • Maximises opportunity for students to show what they can do
Formative assessmentHelps leverage value in content Many AOs are considering practical ways of leveraging their assessment resources (particularly the retired content) for other purposes, e.g. formative and diagnostic assessment. Adaptive assessments are very popular with students and teachers in a variety of settings. Banks of questions are more valuable to educators if they leverage quality performance metadata
Aspects of formative assessment Where am I now? Where am I trying to get to? What do I need to do next?
Schools are ready forsophisticated assessment http://www.tki.org.nz/r/asttle
Summary • Although some challenges remain, most of the significant reliability and validity concerns in e-assessment have been addressed. • E-Assessment now offers several ways to improve assessment and qualification quality. • Consumer demand is latent but levels of acceptance and satisfaction are high. • Continued lack of innovation, particularly in school qualifications, risks the system being seen as out of touch.
END Managing the developmentof robust and reliable assessmentsfor qualifications and learning 21 March 2012 John Winkley