Expert Guidance on Psychological and Educational Test Construction

Advising on the Construction of Psychological and Educational Tests Gideon J. Mellenbergh University of Amsterdam, The Netherlands

Paper presented at the Colloquium Advising on Research Methods Royal Netherlands Academy of Arts and Sciences (KNAW) Amsterdam, The NetherlandsMarch 28-29, 2007

Contents • 1. Definitions • 2. General discussion topics • 3. Item writing • 3.1 Maximum performance items • 3.2 Typical performance items • 3.3 Pilot studies on item quality • 4. First draft • 4.1 Assembly of maximum performance tests • 4.2 Assembly of typical performance tests • 5. Administration modes • 6. Try-out

Definitions A psychological or educational test is an instrument for the measurement of a test taker’s (maximum or typical) performance, which is assumed to reflect a latent variable, under standardized conditions.

Reflective test reflects a latent variable (construct) Assumption that must be testedMaximum performance abilities or aptitudes, skills (e.g., intelligence, arithmetic)Typical performance (questionnaire) attitudes, interests, personality characteristics

Remarks slide 5 The distinction between maximum and typical performance was made by Cronbach (1960). The latent variable can be a continuous or a discrete (e.g., ordinal-polytomous, dichotomous) variable. Standardized conditions are needed to make fair comparisons between and within test takers.

Definitions 2. An item is the smallest possible building block of a test.

2. General discussion topics 1. The construct (latent variable) of interest. Which ability, attitude, interest or personality characteristic must be measured?

2. General discussion topics 2. The target population (of interest) and the frame population (that can be tested).

2. General discussion topics 3. Existing tests for the construct of interest (literature, documentations, psychometric qualities etc.).

Remark slide 10 Tests consist for many different latent variables, but often these tests do not satisfy the client’s research needs.

2. General discussion topics 4. Objectives: research, diagnosis, decision making etc.

Construction strategies Preferences for theory based strategies: - Construct Method - Facet Design Method

Remark slide 13 Empirical research indicates that these two methods yield better tests than other test construction methods (Oosterveld, 1996).

Construct Method (Jackson, 1971) Item writing is based on a theory of the content of the latent variable that will be measured. For example, Ettema’s (2005) questionnaire for the assessment of dementia patients’ quality of life.

Construct Method (Jackson, 1971) Definition based on literature review: Dementia specific quality of life is the multidimensional evaluation of the person-environment system of the individual, in terms of adaptation to the perceived consequences of the dementia.

Construct Method (Jackson, 1971) Theory Dröes’ adaptation-coping model. The model distinguishes seven adaptation dimensions for coping, for example, ‘Developing an adequate relationship with the staff’ Item example ‘Has conflicts with caretakers’

Facet Design Method (Guttman, 1965) Item writing is based on a conceptual analysis of the construct. A number of facets are distinguished, and each of the facets has a number of elements. For example, Stouthard’s (1993) questionnaire for the measurement of patients’ dental anxiety.

Facet Design Method (Guttman, 1965) Construct: dental anxiety Facets: (1) time before treatment (elements: chair, waiting room, on the way to, at home) (2) aspects of dental treatment (elements: introductory, patient- dentist interaction, treatment) (3) patient’s reactions (elements: emotional, physical, cognitive)

Facet Design Method (Guttman, 1965) Factorial combination of the facets 4 x 3 x 3 = 36 cells of the facet design Item example Waiting room/Treatment/Emotional ‘When I know the dentist is going to extract a tooth, I am already afraid in the waiting room’

3. Item writing • An item consists of • a task • a response mode

3. Item writing • Tasks • Maximum performance tests: problem • Typical performance test: statement

3. Item writing • Response modes • Free response • Choice

3. Item writing Example Free response 8 x 14 = ...

3. Item writing Example Choice 8 x 14 = (1) 32 (2) 112 (3) 132

3.1 Maximum performance items Free-response - short-answer items 8 x 14 = ...

3.1 Maximum performance items Free-response - essay item ‘Give reasons for the outbreak of the French revolution’

3.1 Maximum performance items Free-response Responses to free-response items must be graded by judges (correct, partly correct, incorrect)

3.1 Maximum performance items Choice Conventional: multiple-choice items Preferred number of options: 3

Remark slide 29 More options reduce the probability of guessing the correct answer. However, item writers have often difficulty to write a fourth or fifth plausible option. Therefore it is recommended to write somewhat more three-choice items instead of less four- or five-choice items.

3.1 Maximum performance items Choice Structure Stem 8 x 14 = Distractor 1 32 Correct option 112 Distractor 2 132

3.1 Maximum performance items Choice Three options are in alphabetical, logical, or numerical order Options are in vertical position

Item writing rules A large number of useful rules - alphabetical, logical, or numerical order - vertical option positions - avoid tricks - avoid window dressing - three options of equal length - avoid negatives - distractors which are plausible for test takers who don’t know the correct answer Etc.

Remark slide 33 An overview of item writing rules is given by Haladyna, Downing and Rodriguez (2002).

Clients are recommended to check their concept items against these rules.

Test takers’ responses are assessed in a response scale dichotomous (correct/incorrect) ordinal-polytomous (e.g., correct/partly correct/incorrect) bounded-continuous (e.g., number of seconds an examinee needs to give the correct answer to the multiplication ‘8 x 14’)

3.1 Typical performance items Structure Statement & Response scale

3.1 Typical performance items • Response scales • dichotomous • ordinal-polytomous • bounded-continuous

3.1 Typical performance items Classification StatementResponse scale Frequency Number of Frequency Categories All-or-None Two categories Endorsement Continuous Uninterrupted Scale Intensity Discrete More than Two Ordered Categories

Example Frequency How frequently are you happy? (a) never (b) seldom (c) sometimes (d) often (e) usually (f) always

Example • All-or-None Endorsement • Thurstone and Chave’s (1929) Attitude Toward the Church Questionnaire • I feel that church attendance is a • fair index of nation’s morality • (a) agree • (b) don’t agree

Example Continuous Endorsement Intensity Thurstone and Chave (1929) Write an x somewhere on the line below to indicate where you think you belong Strongly favorable Neutral Strongly against to the church the church

Example • Discrete Endorsement Intensity • Likert’s (1932) Internationalism Attitude Questionnaire • Our country should never declare war again under any circumstances • (a) Strongly approve • (b) Approve • (c) Undecided • (d) Disapprove • (e) Strongly disapprove • Preferred number of options of Likert items: 4 to 7

Clients are recommended to make an informed choice between these different item types.

Item writing rules A large number of rules • Use positive statements and avoid direct negatives. A positive statement can be indicative (‘I am feeling great’) or contra-indicative (‘I am feeling blue’)

Item writing rules A large number of rules • If a statement consists of a condition (‘at noisy parties’) and a behavior part (‘I am feeling uneasy’) put the condition at the beginning, for example: • ‘At noisy parties, I am feeling uneasy’ Etc.

Remark slide 46 This rule may go against correct use of style. Test translators tend to reverse the condition and behavior parts in their item translations.

Clients are recommended to use these rules.

3.3 Pilot studies on item quality Clients are recommended to do the following pilot studies on concept items: 1.Experts: A small group of experts (in both content and item writing) discuss the concept items

Remark slide 49 The group needs to consist of (1) content, and (2) item writing experts.

Expert Guidance on Psychological and Educational Test Construction

Expert Guidance on Psychological and Educational Test Construction

Presentation Transcript

Psychological and Educational Tests and Measurements

Revision of the Standards for Educational and Psychological Testing : Overview

Evaluating Psychological Tests

Educational Material: Thrombophilia Tests

Advising on the Construction of Psychological and Educational Tests

Construction and tests of a TPC endplate prototype for the ILC

Psychological Tests

Intelligence Tests and Psychological Experiments

Characteristics of Psychological Tests

The Impact of Psychological Trauma on Development and Learning

Maximizing The Impact Of Advising On Student Success

Update on Revision of the Standards for Educational and Psychological Testing

The Department of: Counseling and Advising

Update on the Revisions to the Standards for Educational and Psychological Testing

Advising on WorkChoices

The Many Ways of Using Tests for Educational Improvement

PSYCHOLOGICAL TESTS AND MEASUREMENT PSY 425

Revising the Standards for Educational and Psychological Testing: The Next Generation

Workmanship and Tests During Construction Phase

Nature and uses of Psychological Tests

The Psychological Impact of Trauma on Responders