Measuring School Effectiveness: Introduction and Background

Measuring School Effectiveness:Introduction and Background Lorraine Dearden, Institute of Education, University of London

Introduction • Set the scene for the rest of the day • Introduction to school testing regime in England • Potted history of how school effectiveness has been measured in England over time • Point out the strengths and difficulties with approaches that have been taken (briefly) • Give some illustrations using work I am currently doing with Alfonso Miranda and Sophia Rabe-Hesketh • Point out the similarities and differences with experience/approaches in some other countries

Testing/Data in England • Have forms of testing at age 5 (FSP), age 7 (Key Stage 1), age 11* (Key Stage 2), age 14 (Key Stage 3 – now abolished); age 16* (GCSEs – Key Stage 4) and at age 18* (A levels – Key stage 5) • Have individual background information on students every year from 2001/02 (basic background characteristics - PLASC) • Data has been linked to vocational courses done in e.g. FE colleges (ILR/NISVQ) as well as HE participation/outcomes(HESA) * Externally marked

History of School Accountability in England • 1988 National Curriculum introduced which all maintained schools are obliged to follow (but not private schools) • For KS1, KS2, KS3 national system of test/assessment introduced (ages 5 to 14) • First school league tables published in 1992 for GCSEs and A levels (secondary schools) • Aim: to inform parents about choice of school and to provide an incentive for schools to raise standards

Development of School Accountability • In 1992 first secondary league tables showed ‘raw’ school GCSE results and one A level indicator of results • Similar tables introduced for primary schools in 1996 based on KS2 test results in English, Maths and Science • Problems of using these ‘raw’ measures immediately pointed out (e.g. Goldstein & Spiegelhalter (1996)) • Simply reflected differences in school intake, background of students, etc • Simple change in composition of intake could improve/worsen results from one year to next • ......

Value Added Models • If want to measure school effectiveness need to come up with measure that measures the value actually being added by school (e.g. Ladd and Walsh (2000)) • Raw test results do not come close to doing this for most schools • Couldn’t actually estimate VA models until could link individual school data • some pilot studies were done in the 1990s but nothing nationally • Only could start doing this for secondary schools in 2001/02 when had baseline KS2 performance data for children who sat KS4 exams in that year 2001/02

Other details • KS4 measure is capped score of 8 best GCSE scores (maximum mark is 58 for A* in subject) so maximum score is 464 • KS2 score is based on levels achieved in English, Maths and Science (not actual test score) • Level 4 (27 points) is expected level, Level 5 (33 points) is 2 years ahead of where expected to be, level 3 (21 points) is 2 years behind where expected to be, level 2 (15 points) is 4 years behind where meant to be at age 11. • For CVA now use raw test scores to give individuals APS between 15 and 36 in each subject • VA and CVA average 3 KS2 APSs (if one missing just average 2).

Value Added • VA model introduced in 2002/03 and used average of KS2 crude APS in English, Maths and Science • Split this KS2 score into 10 groups and looked at the median outcome for these 10 groups • join the median points to get the 'national median line' • Value added was simply the school average of individual deviation from this line (added to 100)

CVA • Critisized that still problematic (see Ray (2006) for full background) – need to take other factors into account to truly measure the value added by schools • Ceiling effects (e.g. KS1 to KS2 VA) • Stability issues • Moved to a CVA model where controlled for prior attainment (fine APS), FSM status, ethnicity, gender, SEN, deprivation(local area), relative age, EAL, mobility • Regression model, with school clustering (multi-level/heirarchical model) • CVA is school average of residuals from this model (+1000)

Ready Reckoner • Model estimated every year on latest data and so way CVA calculated changes every year • Schools given ready reckoner so they can see CVA for every child in school • Pretty obvious that if they have choice over-classifying a student’s background, they know which is more favourable for CVA • Do not have to add as much value (other things being equal) if non-EAL, or White British or unclassifiable ethnicity

Do school understand CVA?

1) Has the school CVA measure solved the problem of informing parents about school choice and ensuring school accountability? No – and this will be the subject of some talks today 2) Could we do better? Yes – and this will be the subject of the other talks today 3) Should we even attempt to measure school effectiveness? .............

Problems with CVA in England • Not very stable (Leckie and Goldstein) • Differences between schools rarely statistically significant (Leckie and Goldstein) • Not very transparent and difficult to understand (Gorard; Dearden, Micklewright and Vignoles) • Evidence of differential effectiveness within school (Brown and Tzavidis; Dearden, Micklewright and Vignoles) • Evidence that parents don’t use it when making decisions – prefer ‘raw scores’ (Machin and Hansen) • Newspapers don’t highlight it generally – still go back to ‘easy to understand’ raw scores

Other problems • Drawing on work I am doing with Alfonso Miranda and Sophia Rabe-Hesketh (using linked survey “Next Steps” data and NPD data for kids born in 1991/92) • Big left censoring in KS2 scores used which disadvantages school with kids at bottom of KS2 ability distribution (and some right censoring) • Also –way the fine point KS2 APS is constructed (transforming raw test scores into the APS system slightly strange)

Raw KS2 Maths Scores

Maths score put in CVA Model Left censoring makes it much harder to add value – why do it?

How serious is it that key covariates not measured? • Key variables, like parental education not measured in the administrative data • But we know, that more highly educated parents are likely, on average, to provide more educational input in the home • Also know that education of parents varies markedly by school – not random • Has implications for CVA

Is this a serious problem in England? • Our work suggests it is (Dearden, Miranda and Rabe-Hesketh (2010)) • Use “Next Steps” survey data linked to the NPD for cohort of children who took KS2 in 2001 and KS4 in 2006 • Observe parental education for just over 12,000 children in our sample (total cohort in NPD just over 550,000)

Regress individual CVA on mother’s education Note: Cluster at school level (698 out of total of just over 3,000 schools and use “Next Steps” survey weights)

What can we do about this? • Use model of Miranda and Rabe-Hesketh (2010) to re-calculate CVA model for whole NPD sample accounting for missing mother’s education • Paper will be presented on Wednesday at Festival • Exploits that fact that for some individuals have survey and NPD data • Explicitly models missingness in survey and administrative data

Results

School league tables in other countries? • Increasingly in a number of States in US being used but not nationally • In Australia introduced national testing (NAPLAN) tests with report cards to parents in Years 5, 7 and 9 • Have website where parents can check performance of schools next to statistically ‘similar’ school • School accountability becoming an issue world wide • PISA world league tables...

Conclusions • School League tables here to stay in England • But very difficult to measure the value added by schools • Given not likely to go away, we need to do it better • Look forward to rest of today’s talks to take us forward on this issue

References • Goldstein, H. and Spiegelhalter, D. J. (1996) League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society: Series A, 159, 385-443. • Goldstein H, Rasbash J, Yang M, Woodhouse, G, Pan H, Nuttall, D, and Thomas, S (1993) ‘A multilevel analysis of school examination results’ OxfordReview of Education, 19: 425-33. • Gorard, S. (2010) All evidence is equal: the flaw in statistical reasoning, Oxford Review of Education, (forthcoming). • Ladd and Walsh (2000) ‘Implementing value-added measures of school effectiveness: getting the incentives right’, Economics of Education Review, vol. 2 part 1 pp. 1–17. • Leckie, G. and Goldstein, H. (2009) The limitations of using school league tables to inform school choice. Journal of the Royal Statistical Society: Series A. vol. 127 part 4, pp835-52. • Ray, A. (2006) School Value Added Measures in England. Paper for the OECD Project on the Development of Value-Added Models in Education Systems. London, Department for Education and Skills http://www.dcsf.gov.uk/research/data/uploadfiles/RW85.pdf.

Measuring School Effectiveness: Introduction and Background