Comprehensive Guide to Test Writing and Analysis

46-320-01Tests and Measurements Intersession 2006

Writing Items • DeVellis (1991) • Define • Item Pool • Avoid long items • Appropriate level of reading • Avoid double-barreled items • Mix positively and negatively worded items • Cultural/ethnic sensitivity

Item Format • Dichotomous format • Two alternatives • Pros: Ease of construction and scoring, absolute judgment • Cons: memorization, chance of being correct

Item Format • Polytomous format • More than two alternatives • Pros: less chance guessing, fast time, distractors • Corrected scores: • Guessing?

Item Format • Likert format • Degree of agreement • Five alternatives vs. six • Reverse scoring • Category format • 10-point scale – why 10? • Remember context • Visual Analogue scale • 100 cm line

Item Format • Checklist • Usually adjectives • Q-Sort • Increases options (9) • Form normal distribution

Item Analysis • Purpose: shorten a test and increase reliability and validity • Item difficulty • Proportion who get the item correct • Probability of chance • Optimum level • Variable difficulty (0.3 to 0.7) • Internal criteria = test score

Discriminability • Extreme group method • Discrimination index • Negative discriminator • Point Biserial method • Small test n • Higher correlation, better the item

Discrimination

Table Explained • Class n = 60 • Discrimination: rough index = U – L • Item Difficulty: U + M + L • Items: • 2 = too easy • 7 = too difficult • 4 & 5 = negative discriminative value

Further Item Analysis

Discrimination Index: Percentages

Item Characteristic Curve • X axis: total test score (trait estimate) • Y axis: proportion of test-takers with the item correct • Often use class intervals

Discriminability • Best scenario

Item Response Theory • Each item has an item characteristic curve • Specific range of difficulty can be identified with a test characteristic curve • Difficulty and discriminability • Sample items • Peaked conventional vs. rectangular conventional vs. adaptive

Criterion-Referenced Tests • Specify objectives – aids learning • Give test to two groups • Exposed vs. not • Antimode – cutting score • Any problems with this?

Test Manuals • Proprietary - qualifications • Nonproprietary • Standards for Educational and Psychological Testing • *reflects changes in federal law and measurement trends affecting validity • testing individuals with disabilities or different linguistic backgrounds • new types of tests as well as new uses of existing tests * Taken from apa.org

Test Manuals • Should include: • How to administer (standard conditions) • How to score • How to interpret • Information on reliability, validity, norms • Be critical!

Base Rates and Hit Rates • What does this test contribute beyond what is already know? • Cutting score not necessarily correct decision • Hit rate vs. base rate comparison • False negatives and false positives

Taylor-Russell Tables • What does the test contribute beyond base? • Need • Definition of success • Base rate • Selection ratio • Test validity coefficient • Determines likelihood someone selected on basis of test will succeed

Taylor-Russell Tables Source: Fisher, Schoenfeldt, & Shaw (2003), Table 7.2

Taylor-Russell Tables • Best: validity high, selection rate low • Bad: validity low, selection rate high • Useless: no validity • Selecting low scorers?

Incremental Validity • Unique information from using a test • Predicting future behavior and self-ratings • Prediction should consider: • Simpler method? • Less expensive method? • Less subject strain?

Mental Measurements Yearbook • Test reviews

Comprehensive Guide to Test Writing and Analysis