290 likes | 305 Views
This presentation discusses the think-aloud method as a tool for developing assessments of modified achievement standards. It covers the background, conducting cognitive labs, adapting them for target students, and general principles.
E N D
Thinking About the Think-Aloud Method to Guide Development of Assessments of Modified Achievement Standards Presentation at the OSEP GSEG Project Directors Conference Steve Ferrara January 16, 2008
Today • No one right way to do cognitive labs (think-alouds) • Background on cognitive labs • Overview and selected issues in conducting cognitive labs • Do not cover data analysis, synthesis, and interpretation and use • Adapting cognitive labs for our target students and for assessments of modified achievement standards • I don’t know how you plan to use cognitive labs in the process of developing and validating items, so I have planned some general comments Steve Ferrara
Two general principles • As with most research methodologies, there is no one right way to do cognitive labs • Think-alouds used in reading comprehension research, survey item development, achievement item development and validation, human factors research and evaluation (e.g., usability studies),… • There are principles and practices that enable us to produce verbal data that can be interpreted reliably and validly and that is practically useful Steve Ferrara
So what are cognitive labs? • (a) Prompting of a specified population of respondents to (b) think out loud to (c) illuminate how they think while they (d) perform specified tasks • Simply put, we ask respondents to “think out loud” while they perform a task • Think-alouds can be used for physical or cognitive tasks Steve Ferrara
Example: think-aloud prompt • I want you to say out loud anything that you are thinking while you are reading and trying to answer these science questions. • Some things you might be feeling include... (hungry, tired, bored; this is interesting, hard). • You might think things like... (this is a biology question; I don’t understand the question; I’m going to reread this part; we did this in class last year). • Say anything you are thinking to yourself…What I’m most interested in is the stuff you are doing in your head – while you are answering the question – that helps you to understand the question and figure out the answer. (From the ICV project; see Ferrara, Duncan, et al., 2004) Steve Ferrara
Thinking aloud • In verbal reporting, respondents: • Bring information into attention • (When necessary) convert the information into verbalizable code • Vocalize their thinking (Ericsson & Simon, 1993, p. 16) • Crucial considerations: • Are respondents aware of the information they use during task completion? • Can they verbalize it? Steve Ferrara
Why do cognitive labs? • In general, to illuminate respondents’ cognitive processing while they perform a task • Exploratory goals • e.g., What reading comprehension strategies do students use when…? • Refinement goals • e.g., Improve clarity and fidelity of interpretation of survey items (Desimone & LeFloch, 2004) • Validation goals • e.g., Ensure that achievement items elicit intended knowledge, skills, and processes (Ferrara, Duncan, et al., 2004; Leighton, 2004) Steve Ferrara
Rough history • Early empiricists in psychology: introspection • Behaviorism in 1930s: introspection fell into ill repute • Surveys and polls: error (e.g., 1948 presidential election) and new interest in wording of questions and tasks • New Society evaluations in 1960s: studies of behavior rather than opinion • Sudman and Bradburn in 1970s: largest effect on responses was tasks (e.g., wording), not interviewers or respondents • Emergence of cognitive psychological research in 1970s and beyond: implications for survey development • 1990s: NCES surveys and Voluntary National Test reading and mathematics items in 1997-2000, reading research, SEPT, etc. • 2000s: Expanding application to educational achievement test items Steve Ferrara
Overview and selected issues in conducting cognitive labs • Retrospective and concurrent • Open-ended and moderately or highly focused (specificity of tasks, uses of probes) • Respondent sampling and generalizability • Task sampling and generalizability • Task difficulty • Respondent willingness and effectiveness in thinking aloud and verbalizing Steve Ferrara
Concurrent and retrospective think-alouds • Describing thinking during task completion or after task completion • Trade-offs • Thinking aloud may alter task performance • Recall and reconstruction may differ from the thinking that actually occurred Steve Ferrara
Open-ended and moderately or highly focused • Open-ended and exploratory • “Please think out loud while you respond to the following items.” • Moderately focused • “…Remember to tell me what you think about when you respond. Tell me about how you understand the item, how you select a response, and how you know which response is correct.” • Probes • “How did you decide to select that response? What information in the item and from school did you use?” Steve Ferrara
Grade 6 science item Steve Ferrara
Illustrative verbal reports • Thinking aloud • “A. The candy will reach Bill. No that can’t be right…B. The candy will go behind Bill…The possibility is the candy wouldn’t reach Bill and then it probably will drop.” • Response to a retrospective probe (“How did you get that answer?”) • “Because I looked and if it’s, well, how it’s going around and he’s going forward so…” Steve Ferrara
Respondent sampling and generalizability • Number of respondents • Rule of thumb: 9 respondents • Internal studies in usability testing suggest that little new information is gained after ~9 respondents • Representativeness of the population of inference • 9 for each key subgroup in the population or 9 total? • Typical subpopulations (e.g., racial-ethnic, gender) or those more likely to be relevant (e.g., instructional program, gender)? • Often, we don’t know enough about which subpopulations may process differently • Cost affordability is a consideration Steve Ferrara
Task sampling and generalizability • All items, a random sample of items, or exemplars from item subsets (e.g., item families)? • We may not know enough about the tasks (e.g., item families) to sample effectively (and that’s why we’re conducting cog labs) • Numbers of respondents, time per respondent, and cost affordability are considerations Steve Ferrara
Task difficulty • Rule of thumb: • Select tasks that are moderately difficult for respondents • Alternately, select respondents that are well matched to the tasks • Consideration: Select tasks in a range of difficulties • Some respondents can verbalize about easy and routinized tasks • Some respondents can verbalize about their thinking, even for tasks that are too difficult or that they don’t know about Steve Ferrara
Respondent willingness and effectiveness in thinking aloud and verbalizing • Some respondents are reticent • Think of middle-schoolers • Think of lower achievers (e.g., Ferrara, Albert, et al., 1996) • Some respondents are unaware of their thinking (i.e., what information they heed) • Some respondents may be willing and aware, but do not verbalize their processing in illuminating or useful ways… Steve Ferrara
Target students for assessments of modified achievement standards • Pursuing grade-level content standards but may not progress at the same rate as their peers • May not be comfortable about verbalizing about what they know and don’t know, may not be particularly metacognitive (i.e., aware), may not verbalize effectively Steve Ferrara
Example from a GSEG project (implications to follow) • Ohio, Minnesota, Oregon, AIR • Persistently low-performing SWDs (PLP/SWD) • Borrowed PLP idea from earlier Georgia work • Students in the lowest achievement level for two or three years (depends on the student cohorts) Steve Ferrara
Example (cont.) • Reading items that function adequately for PLP/SWDs—psychometric definition • P values .4-.6 • Point biserials and polyserials GE .20 • And items that did not function well • Try to determine what distinguishes adequately functioning items and how to make other items more psychometrically sound for PLP/SWDs • (One of several research activities to identify item and test modifications to provide valid and accessible items for PLP/SWDs) Steve Ferrara
Example (cont.) • Initial findings for one cohort • Adequately functioning items • Little and no inference required by comprehension items • Vocabulary items: Definitions in the text • You can put your finger on the answer in the passage • Other items • Inference and synthesis required Steve Ferrara
Further… • The project will define eligibility, in part by identifying students “at the bottom of” grade-level assessments and “at the top of” alternate assessments • Think about trying to get verbal data from students who currently participate in alternate assessments Steve Ferrara
Implications for cog labs with target students • They probably are fairly concrete thinkers • They are not likely to be highly verbal (in the colloquial sense) • They may be reticent • They may not be highly metacognitive (i.e., aware) • How much useful verbal data might we expect to get from students who are likely to be eligible for assessments of modified achievement standards? • Some encouraging results regarding think-alouds with students with learning disabilities (Johnstone, Liu, Altman, Thurlow, 2007) Steve Ferrara
This is not an argument against using cog labs in this situation • But choose the items (and other assessment tasks), students, and think-aloud prompts and probes with the target students clearly in mind • Also, maybe consider a new idea: group cognitive labs • As far as I know, this idea has not been proposed elsewhere • Think of it as focus groups where the focus is cognitive processing while responding to test items Steve Ferrara
Group cognitive labs idea • 4-6 respondents per group • Probably homogeneous in terms of achievement, verbalization, etc. • OTL is an important consideration • Get diversity and generalizability across groups • Could matrix sample items so that respondents do individual think-alouds for 2-3 items Steve Ferrara
Group cognitive labs idea (cont.) • Retrospective reports • Training and practice: • Thinking aloud • Reporting similarities and differences in thinking of other respondents • Avoiding being unduly influenced by others’ thinking • Round-robin think-alouds • Respondent A thinks aloud for item 1 • Other respondents report similarities and differences • Etc. Steve Ferrara
Group cognitive labs idea (cont.) • Possible advantages • Cost-efficiency • Broader sampling • Possible improvement in quantity and quality of verbal reports • Possible drawbacks • Increase in reticence • Respondents influence each other, obscure individual processing, and pollute verbal reports Steve Ferrara
Good luck! Steve Ferrara CTB McGraw-Hill sferrara1951@gmail.com Steve Ferrara
References Desimone, L. M., & Le Floch, K. C. (2004). Are we asking the right questions? Using cognitive interviews to improve surveys in education research. Educational Evaluation and Policy Analysis, 26(1), 1-22. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. (rev. ed.). Cambridge, MA: The MIT Press. Ferrara, S., Albert, F., Gilmartin, D., Knott, T., Michaels, H., Pollack, J, Schuder, T., Vaeth, R., & Wise, S. (1996, April). A qualitative study of the information examinees consider during item review on a computer-adaptive test. In L. Wolf, (Moderator), Item review in computerized adaptive testing. Symposium conducted at the annual meeting of the National Council on Measurement in Education, New York. Ferrara, S., Duncan, T. G., Freed, R., Velez-Paschke, A., McGivern, J., Mushlin, S., Mattessich, A., Rogers, A., & Westphalen, K. (2004). Examining test score validity by examining item construct validity: Preliminary analysis of evidence of the alignment of targeted and observed content, skills, and cognitive processes in a middle school science assessment. Paper presented at the annual meeting of the American Educational Research Association, San Diego. Johnstone, C., Liu, K., Altman, J., & Thurlow, M. (2007). Student think aloud reflections on comprehensible and readable assessment items: Perspectives on what does and does not make an item readable. (Technical Report 48.) Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: the collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23(4), 6-15. Steve Ferrara