Managing the development of robust and reliable assessments for qualifications and learning

Managing the developmentof robust and reliable assessmentsfor qualifications and learning 21 March 2012 John Winkley

Introduction • What do we mean by robust and reliable? And what do our stakeholders mean? How do we know if qualifications are robust and reliable? • Is e-assessment a help or a hindrance to achieving this? • Does this differ for different settings: • Vocational vs academic vs professional qualifications • Diagnostic and formative vs summative assessments • Experience drawn from UK – school (NC tests and GQ), vocational, professional and HE examinations • Also from international projects

Reliable assessments • What do we mean by reliability? • Fairness, in terms of what we control – test specification, test questions, test coverage, test marking and awarding. Repeatability. • Not true/false (although the public perception is commonly that it is, and this has an effect on government and AO tactics). • How do we know? • By being careful about our assessment process in design and implementation • By undertaking analysis of processes and outcomes afterwards. much of it statistical but other methods exist • Reliability is quite hard to prove conclusively, and almost never 100% repeatable

Robust assessments • What do we mean by robust assessments? Rigorous? • I’m taking it to mean “valid”, ie fit for purpose. (there are other aspects of assessment quality too) • Key elements of validity • Does the assessment measure the curriculum properly? • Is the scoring accurate and reliable? (reliability) • Does the scoring match the performance standards • Is it a good predictor (predictive validity) • Do people believe in it? (face validity) • Validity is a slippery term, not least because assessments in the UK often have multiple purposes

Robust assessments (2) Fitness for Purpose Consider Psychology A Level Paul Newton, (then QCDA now Cambridge Assessment) http://www.publications.parliament.uk/pa/cm200708/cmselect/cmchilsch/169/16906.htm#n35

Who else cares? • Most stakeholders • Teachers, parents, students, the press (they just use different words for reliability and validity) • Ofqual and Government • Make it clear that both reliability and validity are essentially non-negotiable • Are very interested in requiring AOs to report reliability measurements for qualifications, and have published detailed research on reliability.www.ofqual.gov.uk/standards/reliability (Mike Cresswell and John Winkley) • We’ve set our stall out on validity (in contrast to other approaches)

Where are we with e-assessment? Landscape divided into four domains

Benefits of e-assessment • The main benefits of e-assessment differ between summative and formative applications: • For e-assessment in on-screen computer-marked testing • Speed of feedback • Increased flexibility and efficiency of assessment • More discriminating assessments • Richness and authenticity of the assessment experience • Environmental benefits compared to the costs of paper-based exams system • (the benefits vary a little for different qualification types and purposes) • For “Wider e-assessment” • Richness and authenticity of the assessment experience • Technology-enabled assessment facilitates and improves the effectiveness and/or efficiency of communication between learners and tutors. • Candidates generally like e-assessment • Becta e-Assessment Landscape Study • http://www.alphaplusconsultancy.co.uk/pdf/Becta%20-%20E-assessment%20Landscape%20review%20-%20Report%20Final.pdf

Barriers to e-assessment • Capital Cost • Custom and practice, apparent lack of interest from stakeholders, other priorities (change) and risk aversion coupled with a lack of commercial pressure • ICT estate (particularly in schools) coupled with examination format • Concerns about validity and reliability, often unarticulated: • What if it tests IT skills rather than what it’s supposed to be testing? • What if it all breaks down mid-test or won’t start? • What about inter-form comparability? • What about face validity (eg the uncanny valley)? • What about transparency and openness? • What about paper to screen comparability? • What about accessibility? • What about screen size, tired eyes, broken mice, use of colour, etc?

E-Assessment, reliability and validity • Most of the legitimate concerns about validity and reliability have been dealt with, thoroughly, both in practice and research • Cost challenges remain in many settings • The UK (and USA) have held a strong lead in e-assessment deployment • However, the rest of the world is catching up • The UK boasts an outstanding technology supply sector • Viable, ‘reliable’ E-assessment for small and large AOs • Powerful assessment types and test creation technologies • Powerful marking technologies • Excellent research and evaluation • (Although the technology is still developing at the edges) • Today, more than presenting problems for validity and reliability, e-assessment provides some strong approaches to the meeting those challenges

Improving reliability and validity • Example 1 – Simple multiple choice tests (R) • Many vocational and professional qualifications in the UK and internationally • E-assessment is well suited to this scenario and it is the easiest to implement • Relatively few ‘public’ concerns expressed about paper-based approaches despite known issues from research. • Multiple response items easier to handle on-screen than on-paper • Example 2 – Media rich, more complex question types (V) • TDA QTS Tests, Functional Skills, DSA Hazard Perception, Medical assessments • Potential for improved content and face validity (authenticity) • Powerful computer marking available

Improving reliability and validity • Example 3 - On-screen marking (R) GCSE in England • Heavily deployed – hundreds of millions of marks given each year on-screen. Most GCSE marking moved to on-screen. • Management information improvement • Efficiency savings by avoiding movement of paper • New possibilities for monitoring and training of markers • Deal with errors and anomalies more effectively • Example 4 - Item banking (R) Many vocational & professional exams (eg DSA) • Workflow around item creation – management information for QA • Test creation using test balancing rules to support on-demand testing • Monitor some aspects of reliability and fairness automatically • Monitoring exposure and drift, and other security issues • Using results to farm item banks • Target resources on weakest areas • Deal more quickly and effectively with anomalies • Adapt to changing requirements

Improving reliability and validity • Example 5 – Innovation in medium-stakes testing (R+V)SQA Unit assessments project • Supports sharing models for content • Innovative approach to dealing with challenges in unit testing • Validity is “built-in”, and wider quality is improved too • And it is not necessarily expensive • Example 6 – Allowing students to use their own tools (V) • Crossover technology with ePortfolios • Maximises opportunity for students to show what they can do

Formative assessmentHelps leverage value in content Many AOs are considering practical ways of leveraging their assessment resources (particularly the retired content) for other purposes, e.g. formative and diagnostic assessment. Adaptive assessments are very popular with students and teachers in a variety of settings. Banks of questions are more valuable to educators if they leverage quality performance metadata

Aspects of formative assessment Where am I now? Where am I trying to get to? What do I need to do next?

Schools are ready forsophisticated assessment http://www.tki.org.nz/r/asttle

Summary • Although some challenges remain, most of the significant reliability and validity concerns in e-assessment have been addressed. • E-Assessment now offers several ways to improve assessment and qualification quality. • Consumer demand is latent but levels of acceptance and satisfaction are high. • Continued lack of innovation, particularly in school qualifications, risks the system being seen as out of touch.

END Managing the developmentof robust and reliable assessmentsfor qualifications and learning 21 March 2012 John Winkley

Managing the development of robust and reliable assessments for qualifications and learning

Managing the development of robust and reliable assessments for qualifications and learning

Presentation Transcript

Review of Qualifications for Learning Professionals

Learning objectives and assessments

AAPPL Assessments and Professional Development

Managing Change and the Learning Function

The Effect of Generation and Interaction on Robust Learning

The Institute for learning and Development

Institute for Learning and Development

Active Learning activities and assessments

Qualifications for and Enumerated Powers of the Presidency

Learning Targets and Formative Assessments

Assessments and Development

VCSEL Reliability and Development of Robust Arrays

Reliable and robust indicators for control chart based fisheries management

Statistics and managing for development results

Machine Learning for Robust Control and Decision Problems

Development of the new National Qualifications

CS598-YYZ : Reliable and Robust Software Overview

For Learning and Development

Check The Information Of Reliable, Robust, Secured And Branded Gate System

Learning objectives and assessments

Qualifications and VET Development Centre / KPMPC

Reliable IT Assessments