330 likes | 452 Views
CAT’s Journey in Georgia. Introduction of GCAT. Decision Visit to CITO Tryouts First database of items (3P) Calibration, fine-tuning Algorithm, software Simulations, infrastructure Large-scale pre-test First GCAT. Fall 2010. November 2010. December 2010. December 2010.
E N D
Introduction of GCAT • Decision • Visit to CITO • Tryouts • First database of items (3P) • Calibration, fine-tuning • Algorithm, software • Simulations, infrastructure • Large-scale pre-test • First GCAT Fall 2010 November 2010 December 2010 December 2010 Januray 2011 February 2011 March 2011 April 2011 May 2011
History of CAT in Georgia • Used in School Leaving Exams • Administered yearly to 12th- and 11th-graders* • Usually administered at the end of the school year (May-June) • 8 subjects: • Georgian language ― Mathematics • Foreign Language ― Physics • History ― Chemistry • Geography ― Biology
Scale and stakes • About 40,000 students take the test each year • Passing grade in all 8 subjects is necessary to obtain a school leaving certificate • The school leaving certificate is needed to enter the university, to work in the civil sector, etc. • If failed, the exam can be retaken next year
What is CAT? • Computerized Adaptive Testing • Administered using the computer • Test is formed “on-the-fly”, adapting to the student’s performance. • Right equating of the results achieved through using the Item Response Theory (IRT) Result: Tailor-made tests for each student with standardized scores
Analogy: 20 Questions Game • I am thinking of something. • You have 20 “yes-or-no” questions to figure it out. • What is the best strategy? • Is it writing up a set of 20 questions ahead of time? • Is it a living thing? • Is it a vegetable? • Is it red? • Is it bigger than a human being? • …
20 Questions Game • Isn’t it a better strategy to base a next question on the replies to the previous ones? • In the absence of information, start with something having a 50/50 chance of being true. • As information builds up along the way, ask more precising questions
Game Test Run • Is it a living thing?
Game Test Run • Is it a living thing? YES
Game Test Run • Is it a living thing? YES • Is it a wild animal?
Game Test Run • Is it a living thing? YES • Is it a wild animal? NO
Game Test Run • Is it a living thing? YES • Is it a wild animal? NO • Is it bigger than human?
Game Test Run • Is it a living thing? YES • Is it a wild animal? NO • Is it bigger than human? NO
Game Test Run • Is it a living thing? YES • Is it a wild animal? NO • Is it bigger than human? NO • Is it furry?
Game Test Run • Is it a living thing? YES • Is it a wild animal? NO • Is it bigger than human? NO • Is it furry? YES
Same principle used in CAT • Computer keeps track of student’s pattern of responses so far. • As test progresses, we learn more about the student’s ability • Computer chooses the next item to get maximal informationabout the student’s level of ability • Purpose of assessment: Get best possible information about students’ ability
Why CAT? • Measurement Precision • More information with less items • Security • Large item bank, individual test forms • Equating • Done automatically, using Item Response Theory (IRT) • Good predictability • Using simulations and IRT
Item Response Theory • Also called the Latent Trait Theory • Assumes the “thing to be measured” is a single entity expressible as a number – call it True Ability, usually denoted by • Assumes that the student’s ability is related in a specific probabilistic way to the response the student gives to a particular item • Why probability?
Why probability? Hard items Approach: Measure a student’s ability in terms of how difficult an item she can solve. But how? Does little around here Has 75% chance of solving a random item from around here Does everything around here Easy items
Item Response Function The students ability and item parameters determine the probability of a correct response. In the two-parameter logistic model (2PL), this is done by the following function:
CAT Algorithm 3 Random items Estimate Ability Choose next item (maximum information) NO Check stopping conditions Administeritem YES Estimate ability, scale and display score, terminate.
Typical run of the CAT (2011, Geo) Ability estimate Item number
Issues • Content validity across subdomains • Certain proportion across subject domains must be observed • Item exposure control • The most informative items tend to get overused • Difficulty control • Student with low ability might get an overly difficult item, or the high ability student might get an overly easy item. • New items calibration • To replenish the item bank, new items need to be tested in realistic conditions.
Exposure Control • Each 5th item is chosen at random from a difficulty interval around the current ability estimate. • Overexposed (~3000 views) items are suspended.
Solutions • Difficulty control • The item is chosen from a restricted difficulty interval surrounding the current ability estimate of the student. • New items • Pilot (unscored) items are administered to each student at regular intervals during the actual test.