Hard-to-Measure Assessments

Hard-to-Measure Assessments Item Writing

Hard-to-Measure Assessments Item Writing Retreat January 2012 Orlando

Effective item writing Item Writing Retreat Orlando Florida January 2012

Steps for Item Development 1. Test Design 2. Item Specifications 3. Item Writing 4. Item Review

Steps for Item Development 1. Test Design • Creates a design for what an effective end-of-course exam would look like for the selected course • Determines which benchmarks are evaluated at what cognitive complexity level

Steps for Item Development 2. Item Specifications • Creates an outline to construct items to effectively measure the intended content

Steps for Item Development 3. Item Writing • Uses the course description and item specifications document to create items • Items written to cognitive complexity levels that the benchmark warrants

Steps for Item Development 4. Item Review • Reviews to look for bias, grammar, punctuation • Pilot testing to determine validityand reliability

Advantages and Disadvantages of Various Types of Items

Item Types Multiple-choice What is the capital of Florida? What is the capital of Florida? A. Miami C. Orlando D. Jacksonville B. Tallahassee This is considered the ‘STEM’ Incorrect answers are called ‘DISTRACTORS’

The Anatomy of the Multiple Choice Item Why are we writing items? • To populate the state item bank • To work in Orlando, Florida • To work with other teachers • To stay in a hotel The Stem CORRECT ANSWER Distractor Distractor Distractor

Item Types Multiple-choice The advantage is that, with careful construction, this type can be used to measure knowledge at most levels. The disadvantage is that it's hard to write good distracters for levels beyond factual recall.

Multiple Choice Items …

Item Types True/False The border between the U.S. and Canada is longer than the border between the U.S. and Mexico. A. True B. False

Item Types True/False The advantage is that, it's the most efficient way to measure a lot of content in a short period of test time. The disadvantages are that it's hard to measure higher-level knowledge areas, and guessing (50% chance of being right).

True/False Items …

Item Types Multiple Select What colors are in the American flag? Mark all that are correct. __ Red __ Green __ White __ Blue __ Black

Item Types Multiple Select The advantage is that, it is an efficient way of measuring a set of facts or concepts that cluster together. The disadvantage is that, this is suitable only for certain knowledge areas.

Item Types Matching For each concept on the left, select the word from the list on the right that best matches it. __Test predicts future performance __Test appears a reasonable measure __Re-test scores are very similar __Low standard error • Face validity • Reliability • Accuracy • Validity • Consistency

Item Types Matching The advantage is, that it allows the comparison of related ideas or concepts. The disadvantages are that it's not suitable for measuring isolated facts and information, and scoring can be complex.

Item Types Ranking Put the following steps in the correct order a test author should take in writing a new test. __ Prepare test blueprint __ Determine test objectives __ Draft test items __ Evaluate items against criteria __ Perform item analysis __ Check with subject matter experts __ Select item types to be used __ Pilot the test and modify as needed

Item Types Ranking The advantage is, that this is perfect when knowing the correct order is important. The disadvantages are, that it's not suitable for anything else, and scoring can be complex.

Item Types Fill in the blank The first President of the United States was ___________________.

Item Types Fill in the blank This type has little advantage over well-written multiple-choice items. The disadvantages are that scoring can be difficult (and sometimes subjective).

The Test of Franzipanics Read the directions and take a few minutes to complete this test. Work individually – no collaboration

The Test of Franzipanics The answers and some rules for multiple choice items • (a) Cluss is repeated • (b) the longest • (c) usually • (d) an • (a) are implies plural • (b) vost is in all others • (c) see item #4 • (d) finishes the pattern

The Moral of this Test…. • ……with poorly written items, you can get them correct without knowing the intended content. • Reliability issue? • Validity issue?

Effective Item Checklist Trying to measure more than one thing with a single item is a mistake commonly made by new test authors. Importance is sometimes confused with item difficulty. Something could be extremely important, but if 100% of the test takers always get the item right, it's probably trivial and should be eliminated. The easiest way to tell if the stem is a complete thought is to cover up the response options and see if you know what you're supposed to do.

Is the Item Stem one Clear Thought?

Effective Item Checklist Write each item as concisely as you can. As you ensure that all response options are grammatically correct with respect to the stem, try to avoid the "a(n)," "is/are" solution. Rewrite the item so you can measure the knowledge or skill without getting hung up on the grammar. Writing plausible distracters is both an art and a science, and it's very hard work.

Let’s examine some items and ‘evaluate’ them against the Effective Item Checklist

Effective Item Checklist The average combined score in the NFL playoff games in 2010 was: <10 <20 >40 >50 Response options that aren't independent

Do’s and Do not's Things to do: • Ensure that there is only one true and defensible answer • Ask peers for help • Get clarification if you have questions

One Defensible Answer?

Do’s and Do not's Things to avoid: • Avoid Jargon and Textbook Language • Clichés • Common Misinformation • Logical Misinterpretations • Copy questions from a textbook • Partial Answers • “None of These” • “None of the Above” • “All of these” • “All of the Above”

Examples of items A test which may be scored merely by counting the correct responses is an _______________ test. A test which may be scored by counting the correct responses is said to be ____________ • consistent • objective • stable • standardized • valid Locate and delete irrelevant clues The item could be rewritten.

Examples of items According to the National Energy Council, the most serious aspect of the energy crisis is the The most serious aspect of the energy crisis is the • possible lack of fuel for industry. • possibility of widespread unemployment. • threat to our environment from pollution. • possible increase in inflation. • cost of developing alternate sources of energy. Include one correct or most defensible answer

Examples of items The components of a multiple-choice item are: Multiple-choice items • stem and several distractors. • correct answer and several distractors. • stem, a correct answer, and some distractors. • stem and a correct answer. • may have several correct answers. • consists of a stem and some options. • always measure factual details. The item should be revised

Examples of items What type of validity is determined by correlating scores on a test with scores on a criterion measured at a later date? • Concurrent • Construct • Content • Predictive Options should be presented in a logical and systematic order

Examples of items A test which can be scored by a clerk untrained in the content area of the test is an • diagnostic test. • criterion-referenced tests. • objective test. • reliable test. • subjective test. Options should be grammatically parallel and consistent with the stem

Examples of items What should be the index of difficulty for an effective mastery-model test item? What should be the index of difficulty for an effective mastery-model test item? Less than 10 Less than 20 More than 80 More than 90 Approximately 10 Approximately 20 Approximately 80 Approximately 90 Options should be mutually exclusive

Examples of items A random sample is one in which subjects are selected by levels in proportion to the number at each level in the population. each subject has an equal probability of being chosen. every nth subject is chosen from a list. groups, rather than individuals, are the unit of analysis. subjects are selected by levels. each subject has an equal probability of being chosen for the sample. every nth subject is chosen. groups are the unit of analysis. Insure that correct responses are not consistently shorter or longer than the distractors.

Examples of items Which of the following is NOT a method of determining test reliability? Which of the following is a method of determining the validity of a test? • Coefficient of equivalence • Coefficient of stability • K-R #20 • Split-halves procedure • Test-criterion intercorrelation • Coefficient of equivalence • Coefficient of stability • K-R #20 • Split-halves procedure • Test-criterion correlation Use negatively stated items infrequently

Webb’s 3 Tier Cognitive Complexity

Webb’s 3 Tier Cognitive Complexity Low Cognitive Complexity: • One-step problem or basic facts • Recall and basic comprehension, identify, label, define

Webb’s 3 Tier Cognitive Complexity Moderate Cognitive Complexity: • Integrate and analyze • Classify, analyze, explain, synthesize, implement

Webb’s 3 Tier Cognitive Complexity High Cognitive Complexity: • Analyze and represent knowledge in new and innovative ways • Create, represent, rearticulate, argue, extend, - content knowledge

Webb’s 3 Tier Cognitive Complexity • DOK level should reflect the level of work most commonly required to perform • DOK level should reflect complexity of the cognitive processes • DOK level describes the kind of thinking required by a task, not whether or not the task is “difficult” • If there is a question between two levels select the higher of the levels

Hard-to-Measure Assessments

Hard-to-Measure Assessments

Presentation Transcript

Local Interim Assessments that Benchmark Measure Growth

Made to Measure

Using the Iowa Assessments to Measure the Iowa Core Session 1

Made to measure

How to Measure

Hard To Heat

Hard to get, hard to keep

What to measure

Hard to Get

To measure is to know

Introduction to Measure – How to Measure Savings and Show Value

HARD TO STOP

Measure for Measure

Hard to Stomach

How to Measure Blinds

How to measure Success

Made to measure

Hard To Find

How to Measure Temperature

Measure Phase Welcome to Measure

Hard to Compare

Measure to Manage: