Economic Perspectives on Standardized Testing (c) Richard P. Phelps (c) 2002, by Richard P. Phelps

Economic Perspectives on Standardized Testing: Outline • Why can’t economists and psychologists just get along? • Overview of economic theory as it pertains to education & testing • Human capital theory and the economics of information • Supply & demand; benefits & costs; goods & bads • The cost of standardized testing (from society’s point of view) • The benefits of standardized testing (information) • The benefits of standardized testing (motivation) • Optimal testing system structures • Optimal testing industry structures • Discussion

Topic 1: Why can’t economists and psychologists just get along?

1) Why can’t economists and psychologists just get along?[answer: sometimes they do] • Tversky and Kahneman, two cognitive psychologists, asked themselves why rational economic man patronizes casinos, where the odds are against him. • Their experiments revealed that tolerance of (or, attraction to) risk varies widely among individuals, and most weigh small risks against low-probability, but very large, gains “sub-optimally” • Tversky’s and Kahneman’s work is now required reading for any economics major • Experimental economics, which strongly resembles cognitive psychology in its methods, is now the fastest growing area of research in the field.

1) Why can’t economists and psychologists just get along?[answer: sometimes they do not] Test Utility research • Thousands of studies conducted by I/O psychologists from the 1960s through the 1980s • Dozens of meta-analyses • Even a few meta-analyses of the meta-analyses • Few economists, then or now, even aware of the field Decline in interest in Test Utility research • Regulatory ruling against validity generalization in late 1980s by Civil Rights office in Reagan administration • National Research Council forms committee with curious membership to critique a single Test Utility study (critique interpreted by many as a condemnation of all Test Utility research)

Topic 2: Overview of economic theory as it pertains to education & testing

2) Economic theory as it pertains to education in general Traditionally, education economics conducted in 2 fields Labor Economics • Labor markets for teachers and graduates • Returns (in wages) to investment (in years) in education Public Finance • Returns (in achievement, attainment) to investment (in tax revenues) • Funding equity, adequacy, efficiency, & intra-metropolitan migration

2) Economic theory as it pertains to testing in particular Human Capital Theory • Higher wages over the long term can more than compensate for the earnings foregone while still in school • …assumed a strong correlation between accumulation (years in school, any school) and earning power (applicable knowledge and skills) Economics of Information • Basic economic assumption of “perfect information” is simplistic • When buyer and seller have “asymmetric” information, classic economic assumptions are not appropriate

Topic 3: Human capital theory and the economics of information

3) Human capital theory: seminal works • Human Capital (1954), Gary Becker • Schooling, Experience, and Earnings (1974), Jacob Mincer • Dozens of World Bank reports

3) Economics of Information: seminal works • “The Market for Lemons” (1970) George Akerlof • When buyers can evaluate a purchase based only on a quality assessment of the entire group, sellers have an incentive to market poor quality merchandise and, over time, the average quality of goods declines. Often-used counters to quality decline are: guarantees, brand names, franchising, and credentials. • “Economics of Imperfect Information”(1976) Rothschild, Stiglitz, Grossman • Perfectly competitive markets have perfect information. In markets without perfect information, there is little incentive for private individuals to fill the breach (Consumers’ Reports is an exception, and not very profitable). Thus, there can be a role for government to promote market efficiency, by providing information.

3) Screening, signaling, filtering, credentialing, I • Education and Jobs: The Great Training Robbery (1970), Ivar Berg • Employers pay for credentials, not human capital; they know little to nothing of the quality of education programs, only the perception thereof • Generating Inequality (1972) Lester Thurow • Employers want “trainable” employees, and judge that those who could endure schooling are probably more trainable than those who could not • Work of Piore and Doeringer on “Market Segmentation” • Neither education nor education credentials matter in “secondary” labor markets, only in “primary” market, with career ladders

3) Screening, signaling, filtering, credentialing, II • Market Signaling (1973), Michael Spence • Diplomas are a signaling device to employers, who take a gamble with every new hire; evidence that the graduate is hoping employers will conclude that certain human capital has been obtained, but not proof that it has • “On the Weak versus the Strong Version of the Screening Hypothesis” (1979) George Psacharopoulos • Weak: employers pay only higher starting wages for “better” credentials • Strong: employers continue to pay higher wages for “better” credentials even after they become familiar with each employee’s actual productivity • “Higher Education as a Filter” (1973) Kenneth Arrow • “The Theory of Screening” (1975) Joseph Stiglitz

3) Empirical and theoretical work on standards • Burton Weisbrod (1964) • Discovered that 90% of adults are hired within the boundaries of a school district other than the one from which they graduated • So, employers are not familiar with and have no influence over the education standards used to train virtually all their employees • John Bishop (1980s) • It is unreasonable to expect a teacher to be both a sympathetic coach and a neutral judge. External exams let them be coaches exclusively, which is in keeping with what most of them probably want anyway. • Robert Costrell (1994) • School district incentives are to inflate grades and socially promote. If they maintain tough standards, they only hurt their own children in later competition against graduates of other districts where standards are lax and grades inflated. • Standards must be enforced externally, or they will not be.

Topic 4: Supply & demand; benefits & costs; goods & bads

4) Benefits & costs; goods & bads • Economists are (small d) democrats • what is a “good” or a benefit is relative to each individual; the researcher does not get to decide what is good or bad for the consumer; consumers decide for themselves • but, we’d all like more money (freely exchangeable) and more free time • Economists assume we all want more of something (even if it is spiritual enlightenment), and that we can’t always get it • Benefits have two phases: creation and capture • Not all potential benefits are realized, or “captured” • (e.g.,) You do very well and learn very much at a college with a terrible reputation, and then cannot get a job because of that reputation

4) The demand for standardized testing • Phelps (1998) - 40 years of public opinion poll data • The adult public is not ignorant about standardized tests, since all have taken many, for better or for worse • Support for high-stakes standardized testing is overwhelming, and has been consistently so for decades • Most stakeholders, including students and parents, are strongly supportive. Teachers are usually supportive, but don’t like being judged for outcomes over which they have little control. Education professors are strongly opposed. Administrators have been on the fence, may now be opposed. • The year 2000 “testing backlash” was very strongly hyped public relations creature, and completely unsupported by the objective evidence.

4) “Natural Experiments” in test demand and valuation:a) countries liberalize education, b) drop test requirements,c) find that standards deteriorate, d) then revert back to testing • Many Western European and North American states (1960s – 1970s) • Many Post-Colonial, Newly-Independent states (1940s – 1970s) • Ex-Communist Eastern European states (1990s – 2000s)

4) Trends in test adding/dropping, OECD countries: 1974--1999

Number of countries or provinces... Type of testing ...adding testing ...dropping testing Assessments 17 0 Upper secondary exit exams 12* 0 University entrance exams 5 0 Subject-area end-of-course exams 6 0 Lower secondary exit or entrance exams 4 2 Inclusion of voc/prof tracks in exit exam system 3 0 Primary/secondary-level achievement testing 2 1 Diagnostic testing 2 0 TOTAL 51 3 4) Countries adding or dropping large-scale, external testing, by type of testing: 1974-1999

Primary school Lower secondary school Upper secondary school Belgium (French) Italy Netherlands Russia Singapore Switzerland (some cantons) Belgium (French) Canada: Quebec China Czech Republic Denmark France Hungary Iceland Ireland Italy Japan Korea Netherlands New Zealand Norway Portugal Russia Singapore Sweden Switzerland United Kingdom: England & Wales, Scotland Belgium: (Flemish) & (French) Canada: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland, Quebec China Denmark Finland France Germany Hungary Iceland Italy Japan Netherlands Norway Portugal Russia Singapore Sweden Switzerland United Kingdom: England & Wales, Scotland 4) Countries with nationally standardized high-stakes exit exams, by level of education

4) Demand for testing is not unlimited – saturation is possible

Topic 5: The Cost of Standardized Testing (from society’s point of view)

5) Cost jargon • Marginal cost (the cost of the next unit): For a test, it is the cost that is incurred due to the addition of a test, and only that cost. • (e.g., during test administration, the school building must be maintained, but such would be the case without a test, too. The test is not responsible for this cost.) • Subject-matter instruction occurs whether or not there is external testing, so it also is not a cost of the test. • Opportunity cost (cost of foregone opportunities (i.e., instead of doing this, you could have been at work making money)): For a test, the time a teacher spends preparing for, monitoring, or scoring a test is time he could have been planning his course, grading homework, etc. • If the teacher makes productive use of the time while students are taking a test, there are no opportunity costs.

5) Average all-inclusive per-student costs of two test types in states having both: 1990-91 SOURCE: U.S. GAO, 1993, p.43

All systemwide tests Sample of 11 state performance tests Sample of 6 multiple-choice tests in those same states All-inclusive marginal cost $15 $33 $16 …minus adjustment for regular school year administration -7 -15 -7 ...minus adjustment for replacement of preexisting tests -6 -12 -12 Marginal cost after adjustments $5 $11 $2 5) Average per-student costs of two test types in states having both, with adjustments: 1990-91 SOURCE: Phelps, 2000.

5) “Economies” jargon • The unit cost of producing your product declines the more of an “economy” you have (because fixed/overhead costs get spread out) • Scale – you can sell at lower cost because you make so many of them • Scope – you can sell at lower cost because you make other stuff that is similar, or in similar ways • Learning – you figure out ways to be more efficient and productive as you gain experience • There are many “economies” (just like validities)

Economies of scale in state performance testing

Some economies of scope in state performance testing

5) General structure of testing costs Scorers are... GROUPS of teachers or professional scorers INDIVIDUAL teachers or professional scorers a COMPUTER Students take tests... EN MASSE in GROUPS ONE at a TIME

5) Slack capacity in U.S. students’ time = opportunity for windfall gain ?

Topic 6: The Benefits of Standardized Testing -- Information

6) Information benefits of testing • For whom? Could be anyone – student, parent, teacher, school, public, postsecondary institution, employer, … • Information can be used beneficially in: • Diagnosis (of student, teacher, school, ….) • Alignment (to standards, schedule, each other, …) • Learning for teachers • Goodwill with public • Decisions (promotion, placement, selection, …)

6) Information benefits of testing – how are they measured? • Predictive validity (fairly measurable) • Allocative efficiency (fairly measurable) • (the greater the range restriction the higher the allocative efficiency?) • Alignment (not so easy to measure) • Goodwill (not at all easy to measure)

Topic 7: The Benefits of Standardized Testing -- Motivation

6) Motivational benefits of testing – how are they measured? • In controlled experiments: • Ex. A) One group is told the test at the end of the course comes with a reward; control group told it does not count • Ex. B) One group is tested throughout course; control group is not • In large-scale studies--Graduates from regions with high-stakes tests compared to their non-tested counterparts: • By their relative performance on another, common test • Their relative wages after graduation • Their relative rates of dropout, persistence, attainment, … • “Backwash Effect” (e.g., students in states with high-stakes high school graduation tests perform better even on the 8th-grade level IAEP, TIMSS, or NAEP

7) Large-scale studies finding benefits to the use of external, high-stakes examinations • John Bishop (1980s+) several studies -- IAEP, TIMSS, SAT, NY State, Canada, … • Winfield; Fredericksen; Bishop; Jacobson (minimum comp. states) • Others: Graham, Husted (SAT); Grissmer, Flanagan (NAEP); Phelps (TIMSS+); Carnoy (NAEP); Rosenshine (NAEP); Braun (NAEP); Wenglinsky

7) Smaller-scale studies finding benefits to the use of high-stakes examinations • Controlled experiments – Tuckman, Trimble; Webb; Wolf, Smith; Egeland; Jones; Brown, Walberg; Tuckman; Khalaf, Hanna; others…. • Evaluations -- Anderson, Muir, Bateson, Blackmore, Rogers; Heyneman; G.A.O.; Achieve; Stake, Theobald; Bond, Cohen; Calder; Glassnap, Pogio, Miller; others… • Case studies – S.R.E.B.; Schleisman; Neville; Goldberg, Roswell; Schlawin; Delong; Lerner; Jett, Shafer; others…

Difference (in standard deviation units) Difference (in grade- level-equivalent units) Difference per student (in net present value) in 1993 dollars* Canada: High-stakes testing provinces vs. others .233 (in math) .183 (in science) .75 (in math) .67 (in science) $13,370 (in math) $11,940 (in science) USA: New York State vs rest of U.S. .164 (in SAT Verbal +Math) .75 (verbal + math) $13,370 IAEP: High-stakes testing countries vs. others .586 (in math) 2.0 (in math) .7 (in science) $35,650 (in math) $12,480 (in science) TIMSS: High-stakes testing countries vs. others n/a .9 (in math) 1.3 (in science) $16,040 (in math) $23,170 (in science) 7) Bishop's estimates of dollar value of high-stakes exams on student outcomes * Based on male-female average, averaged across six longitudinal studies, cited in Bishop, 1995a, Table 2, counting only general academic achievement, not accounting for technical abilities.

Topic 8: Optimal testing system structures

8) Single or multiple target systems • Becker and Rosen (1990) • A “single target” examination (e.g., minimum competency) is problematic • Set too high, slower kids will be discouraged and drop out • Set too low, and advanced kids will be bored and may work less • Examination systems should have multiple targets • Empirical Studies of 1970s—1980s Minimum Competency Exams(e.g., Ligon, Mangino, BabcockJohnstone, Brightman, Davis) • Performance of lowest students did improve, but that of advanced students either stayed flat, or decreased • Jonathan Jacobson (1992) • Longitudinal analysis of students from minimum competency states showed that slowest students gained and middle students lost • Probably, the test induced resource flows to the slow students and away from the middle students

8) Examples of multiple target systems • Hierarchical, or “tiered,” systems – British system, New York State • All students must pass exams with broad, common requirements, but at choice of levels (Advanced or Ordinary; Competency or Honors) • British just recently changed, creating a hybrid that looks more like continental exam systems • Branched or parallel track systems – Most of Continental Europe • Students choose (or the choice is made for them) where to concentrate their efforts, and they are tested mostly on that concentration • First branching (junior high level) into academic, general, vocational • Second branching (high school level) into subject area or vocational concentration

8) Some current research on testing system structure • John Bishop • Suspects that standardized end-of-course or end-of-year examinations may be the most optimal form of standardized testing. • Why? – perhaps because they combine the best of both worlds • standardized and external • concise, targeted, with very strong alignment between curriculum and test • Value-added systems • Concerns for volatility and fairness mandate that the testing be frequent – at least annual • Tests not only quality control measure; How to optimize whole set (Phelps, Just for the Kids, others…)

8) The more high-stakes decision points, the better the student performance ? SOURCE: Phelps, 2001

8) Quality control has proportionally greater effect in poorer countries SOURCE: Phelps, 2001

Topic 9: Optimal testing industry structures

9) The industry structure game, in theory • Selfish consumers want a perfectly competitive industry • Lots of producers, cutthroat competition • Easy producer entry to, exit from industry • Low prices, lots of choice and information • Selfish producers want to be monopolists • Raise prices, lower quality • Block new entrants, withhold information

9) The industry structure game,in practice • Consumers want stable suppliers, salespeople they know, brand names they can trust • So, sure, they want competition, choice, and low prices… • But, they do not want to have to try out a new brand of detergent after every visit to the grocery store • Producers try to avoid monopoly, or else get regulated or split up • e.g., Microsoft pushes Apple and Corel to the brink of bankruptcy, then tosses each of them a lifeline to keep them in business (barely) • So, the goal is to approach having a monopoly without quite having one

9) Competitive strategy theory • In industries with steep economies (of scale, scope, learning, ….) there is only room for so many producers • If you do not have the relevant “economies” in your firm, you had better focus on a specialty niche that makes you unique, or else get out • (e.g.) General Electric/RCA Consumer Electronics (1987) • Crowded field: Sony, Zenith, Phillips, Toshiba, Mitsubishi, others • Sony - technological edge, reputation for quality, could charge high prices • Niche players – Mitsubishi (big screen TVs); Sharp (flat panels) • Low cost players – Koreans had entered market, Chinese were purchasing the facilities of bankrupt American firms (e.g., Admiral, Philco, Sylvania) • Japanese manufacturers were building assembly plants in US and Mexico in order to lower their shipping costs for large sets • GE was “stuck in the middle” – could not compete on cost or quality and had no unique niche – they sold out

9) Possible sources of competitive advantage in the testing industry • Advantages related to scale economies • Huge item banks take time to accumulate and test and they are copyrighted (‘sunk costs’ => barrier to entry) • Established client base, relationships • Advantages related to scope economies • Much psychometric expertise is equally useful across a variety of tests • Customers needs largely similar across states, countries • Good brand name provides instant cachet in new markets • Advantages related to learning economies • Experience working with, knowledge of clients • Experience gained with a new type of product will lower cost for subsequent, similar projects

Economic Perspectives on Standardized Testing (c) Richard P. Phelps (c) 2002, by Richard P. Phelps