320 likes | 332 Views
Pursuing education imperatives of the NDP and SDGs What are the opportunities and pitfalls we face in building censal and sample-based monitoring of learning outcomes at primary level? Stellenbosch University July 2019.
E N D
Pursuing education imperatives of the NDP and SDGs What are the opportunities and pitfalls we face in building censal and sample-based monitoring of learning outcomes at primary level? Stellenbosch University July 2019
Which of the following statements most accurately describes the situation with respect to the year-on-year comparability of our Grade 12 mathematics examination results? Results are comparable because examiners ensure that examination papers are equally difficult. Results are comparable because certain questions are repeated across years and these allow for statistical adjustments which bring about comparability. Results in an examination system such as ours can never be truly comparable over time for several reasons, including shifts in examination paper difficulty. Results would be comparable if certain questions were repeated across years, which is currently not the case.
Topics to be covered • Why this preoccupation with fine-tuned monitoring of basic skills of learners? • How does one measure progress in learning outcomes? • What often goes wrong? • What type of monitoring the SDGs and our NDP want • A short history of primary-level assessments in South Africa • Future pathways, with their costs and benefits
> Why this preoccupation? • Understandable questions one hears… • Why just reading and mathematics? Surely one should be looking at the curriculum as a whole? What about non-academic social skills? • Surely what really counts for economic progress and employment is having the right mix of vocational training? • Are you trying to fatten the cow by measuring it? • What do we gain from continually comparing schools and countries against each other using these measures? • This question in particular is often poorly answered.
> Why this preoccupation? (contd.) • Growth accounting went from • this • to this ±2005 Impact of years of schooling on economic growth ZAF Hanushek and Woessman Impact of test scores on economic growth Hanushek and Wößmann, 2007.
> Why this preoccupation? (contd.) Data for developed countries good enough to look at test score trends over time Compare e.g. Netherlands against Germany. 1975-2000 growth with mean set at zero. N.B. Here e.g. 1 is 0.01 standard deviations a year. Hanushek and Woessmann, 2009.
> Why this preoccupation? (contd.) What does e.g. an improvement of .01 standard deviation a year really mean? Yet they are similar learners, because both achieved 0.7 of a standard deviation above the mean (average). Rachel got 49% Thabo got 67% 7
> Why this preoccupation? (contd.) We can extend this to talk about annual improvements in terms of standard deviations. Vertical dotted lines represent one standard deviation.
> Why this preoccupation? (contd.) ‘Speed limits’ at different levels of development From around 0.06 s.d./p.a. (South Africa, Namibia) to around 0.02 s.d./p.a. (Germany). (Standard deviations per annum.) Gustafsson, 2014.
> Why this preoccupation? (contd.) • The two inconvenient truths of schooling systems confirmed by the education economists… • Up till now, proven and sustained improvements have not exceeded around 0.06s.d. (and that’s only possible if your base is quite low). • Yet education plans routinely set goals which are way in excess of this ‘speed limit’. • It is true that targeted interventions can often push scores up by 0.15 s.d. in a year, but ‘general equilibrium effects’ make this unlikely to occur at a system level. • The inconvenient truth of slowness means any monitoring system must be able measure small change. Missing such small change can easily lead to impatience, policy (or curriculum) change, and a vicious cycle of instability.
> Why this preoccupation? (contd.) In South Africa, we set targets roughly in line with the speed limit, and have been fortunate in actually getting close to this. Department of Basic Education, 2015.
> Why this preoccupation? (contd.) • Around half of schooling happens not in school, but in the home.
> How does one measure progress? Year 2 Year 1 Exactly the same test, kept secure (secret). Approach 1 E.g. SACMEQ Grade 6 reading 2000-2007 [200-school sample, 55 MCQs] Large enough, representative national sample Different tests but with some secure common items. IRT analysis becomes essential. Questions like this can be made public after the test. Approach 2 E.g. TIMSS mathematics [180-school sample, 26 items, around half MCQ, half constructed response items, 40% being repeats] Ross et al, 2008; Martin et al, 2016.
> How does one measure progress? Year 1 Year 2 For all schools: Completely different tests each year. Universal/ censal coverage Australia’s National Assessment Programme (NAP) [±40 schools in the sample] Approach 3 For the sample: A mix of (1) secure and repeated items and (2) items from the current year’s universal tests. Equating sample Australian Curriculum, Assessment and Reporting Authority, 2017.
> What often goes wrong? • Doing standardised assessments well, even if sample-based, is demanding in terms of human capacity. From the PIRLS technical report: • Update assessment frameworks. • Determine the right mix of multiple choice and constructed response items. • Develop scoring guides for the constructed response items. • Translate items and materials into the required languages. • Field test new items. Combine different sets of items in different but equivalent tests in line with a matrix sampling approach. • Determine optimal sample sizes and designs. • Manage experts with varying opinions during a process lasting three years. • Produce rigorous reports which take into account possible irregularities in the data.
> What often goes wrong? (contd.) Important warning: Existing guides are thin on how sample-based and censal assessment systems differ. This has contributed to serious blunders in the education planning space. Gustafsson, 2019b.
> What often goes wrong? (contd.) Declines in PISA? Probably not. Ages of students, and timing of the testing in the year changed! Example of a sampling problem. Jerrim’s (2013) analysis of England’s PISA and TIMSS trends.
> What often goes wrong? (contd.) MCQs tend to hide floor effects due to random guessing. This too is a problem. Floor effects Gustafsson, 2019b.
> What often goes wrong? (contd.) Security and rigour
> What often goes wrong? (contd.) • Inappropriate accountability strategies and/or difficult political environment. From the 2011 resolutions of Education International (the world’s largest federation of unions): EI believes that a widespread abuse of the notion of quality to justify standardised forms of testing is harmful to the education system as a whole, as it attempts to reduce the teaching and learning process to quantifiable indicators. When one form of evaluation designed for a particular purpose is used to serve a different purpose, the consequences can be unforeseen and damaging The social values of education require public authorities to protect the education sector from the neo-liberal agenda of privatization and commercialisation.
> What the SDGs and our NDP want The SDGs: A strong focus on equity, meaning in education the problem of unequal learning outcomes. SDG indicator 4.1: By 2030, ensure that all girls and boys complete free, equitable and quality primary and secondary education leading to relevant and effective learning outcomes SDG indicator 4.1.1: Proportion of children and young people: (a) in grades 2/3; (b) at the end of primary; and (c) at the end of lower secondary achieving at least a minimum proficiency level in (i) reading and (ii) mathematics, by sex
> What the SDGs and our NDP want The NDP: Fully embraces the global shift toward learning outcomes. This includes monitoring learning outcomes through rigorous, standardised systems, but also making schools accountable for improvements in what learners can achieve. In a section headed ‘Proposals for results oriented mutual accountability’ (p. 311) [ROMA?], the following appears: Externally administerand mark the ANA for at least one primary school grade to ensure that there is a reliable, system-wide measure of quality for all primary schools. This will serve as a snapshot of the health of the system and help authorities to develop targeted interventions.
> Primary-level assessments in SA In the current context of the absence of ANA, South Africa stands out as the only country in SADC (apart from Angola), with no universal standardised assessment at the primary level. Gustafsson, 2018.
> Primary-level assessments in SA No history of an external examination at the primary level in South Africa (or, until independence, Namibia), unlike other neighbouring countries. In part, this was due to junior secondary access having been high and automatic. Regulation 1718 of 1998 introduced sample-based standardised assessments, the ‘Systemic Evaluation’. This was run in 2001 (Grade 3), 2004 (Grade 6) and 2007 (Grade 3). Unfortunately, capacity was weak, as was reporting, especially in 2007. An opportunity to report on 2001 to 2007 improvements in Grade 3 were missed! The 1998 regulation also outlined parameters for internal assessments (currently referred to mostly as schools-based assessment), which is a vital element of teaching, but is not designed to produce standardised measures.
> Primary-level assessments in SA Under Naledi Pandor, a push for external assessments for all schools began. The Foundations for Learning programme (Notice 306 of 2008) ‘primed’ the system, and then between 2011 and 2014 the massive Annual National Assessments programme was conducted across several grades. ANA resulted in annual Matric-style reports and Ministerial announcements. A large national database of learner results was built up. How standardised was ANA? There was no equating sample, no adjustments of universal results. A ‘verification sample’ was aimed at monitoring comparability within one year, not over time. Though official reports explained the non-comparability over time, ANA was treated by many politicians, officials and schools as if progress could be monitored.
> Primary-level assessments in SA Unions, education academics, and even many officials did not like ANA. In places, principals and even teachers were being held accountable for progress, even if there comparability over time was not possible. For the bureaucracy, it involved high levels of effort. In 2015, unions succeeded in stopping ANA. Despite all its problems, some have argued that ANA helped put learning outcomes ‘on the map’, and that this contributed to the improvements seen in the international programmes. A 2016 ‘post mortem’ of ANA, publicly available, involving government and unions, provides a useful evaluation [Department of Basic Education, 2016].
> Primary-level assessments in SA Since 2015, provinces and districts have attempted to fill the ANA vacuum by issuing standard tests to schools, but with little standardisation of marking. This has led to a sense of an excess of assessments, ‘too much assessment’. DBE has focussed largely on re-instating the sample-based Systemic Evaluation, with a new and robust design. The future of universal (or censal) standardised testing at the primary level is currently not clear.
> Primary-level assessments in SA Is there a PIRLS uncertainty? I believe there is. The 2011 score was recalibrated, but no technical documentation assuring us this was done correctly. Gustafsson, 2019a.
> Recapping • The current focus on improving basic skills is largely founded on the economics literature and recent innovations in growth accounting. • Even countries with relative strong improvements, improve slowly (the ‘inconvenient truth’). We need really fine-tuned tools to gauge improvements, or we may miss them. • Secure common (repeated) items are necessary for monitoring progress. If one wants to have censal (universal) testing, substantial technical challenges emerge. But solutions such as equating samples exist. • South Africa has a chequered history of both sample-based and censal assessments at the primary level. Current attempts to re-introduce a sample-based programme are feasible, and important in terms of SDG reporting. Censal testing is necessary in terms of the NDP, but comes with considerable political and technical challenges.
References Australian Curriculum, Assessment and Reporting Authority (2017). National Assessment Program – Literacy and Numeracy 2017: Technical report. Sydney. Department of Basic Education (2015). Action Plan to 2019: Towards the realisation of Schooling 2030. Pretoria. Department of Basic Education (2016). The development of a National Integrated Assessment Framework. Pretoria. Gustafsson, M. (2014). Education and country growth models. Stellenbosch: University of Stellenbosch. Available from: <http://scholar.sun.ac.za/handle/10019.1/86578> [Accessed April 2014]. Gustafsson, M. (2018). Standardised testing and examinations at the primary level: Current thinking and practices. Gustafsson, M. (2019a). TIMSS, SACMEQ and PIRLS: Data issues. Gustafsson, M. (2019b). Floor effects and the comparability of developing country student test scores. Paris: UNESCO. [Forthcoming.] Gustafsson, M. & Nuga Deliwe, C. (2017). Rotten apples or just apples and pears? Understanding patterns consistent with cheating in international test data. Stellenbosch: Stellenbosch University.
References (contd.) Hanushek, E.A. & Wößmann, L. (2007). Education quality and economic growth. Washington: World Bank. Hanushek, E.A. & Woessmann, L. (2009). Do better schools lead to more growth? Cognitive skills, economic outcomes, and causation. Washington: National Bureau of Economic Research. Jerrim, J. (2013). The reliability of trends over time in international education test scores: Is the performance of England's secondary school pupils really in relative decline? Journal of Social Policy, 42(2): 259-279. Martin, M.O., Mullis, I.V.S. & Hooper, M., eds. (2016). Methods and procedures in TIMSS 2015. Chestnut Hill: IEA. Ross, K.N., Saito, M., Dolata, S. & Ikeda, M. (2008). Chapter 2: The conduct of the SACMEQ II project. Paris: IIEP.