1 / 24

André A. Rupp, EDMS Department, University of Maryland

Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education : A Modern Psychometric Perspective –. André A. Rupp, EDMS Department, University of Maryland. Toward a Definition of “Diagnostic Assessment Systems”. Proposed Panel Definition.

bobby
Download Presentation

André A. Rupp, EDMS Department, University of Maryland

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: • A Modern Psychometric Perspective – André A. Rupp, EDMS Department, University of Maryland

  2. Toward a Definition of “Diagnostic Assessment Systems”

  3. Proposed Panel Definition The term "diagnostic” comes from a combination of dia, to split apart, and gnosi, to learn, or knowledge. We use “diagnostic assessment (system)”to refer to assessment processes based on an explicit cognitive model, itself supported by empirical study, of proficient reasoning in a particular domain. The cognitive model must support delineation of students’ and / or teachers’ strengths and weaknesses that can be traced as they move from less to more proficient reasoning in the domain. The principled assessment design process should specify how observed behaviors are used to make inferences about what students or teachers know as they progress. We believe that diagnostic assessment has the potential to inform and assess the outcomes of instruction.

  4. Conceptualization of Problem Space from Stevens, Beal, & Sprang (2009)

  5. Toward an Understanding of Frameworks & Models

  6. The Evidence-centered Design Framework adapted from Mislevy, Steinberg, Almond, & Lukas (2006)

  7. Frameworks vs. Models A “principled assessment design framework” for diagnostic assessment such as evidence-centered design is NOT a “model”. Itdoes NOT prescribe a particular statistical modeling approach. A “statistical / psychometric model” is a mathematical tool that plays a supporting role for generating evidence-based narratives about students’ and / or teachers’ strenghts and weaknesses. Its parameters do NOT have inherent meanings. A “cognitive model” for diagnostic assessment is a theory and data-driven description of how emergent understandings and misconceptions in a domain develop and how these can be traced back to unobservable cognitive underpinnings. It does NOT prescribe a singular assessment approach.

  8. Evidence-based Reasoning for “Traditional” Assessments

  9. I1 Test Score I2 : Ik I1 Test Score I2 : Ik I1 Test Score I2 : Ik Traditional Construct Operationalization Construct Construct Construct Theoretical Realm Empirical Realm

  10. Feedback Utility (Part I – Scoring Card)

  11. Feedback Utility (Part II – Simple Progress Mapping) Level 3 Level 4

  12. Evidence-based Reasoning for “Modern” Assessments

  13. Complex Assessment Tasks for Diagnosis (Part I) fromSeeratan & Mislevy (2008)

  14. Complex Assessment Tasks for Diagnosis (Example II) from Behrens et al. (2009)

  15. Evidence Identification, Aggregation, & Synthesis from Stevens, Beal, & Sprang (2009)

  16. Proficiency Pathways from Stevens, Beal, & Sprang (2009)

  17. Interventional Pathways from Stevens, Beal, & Sprang (2009)

  18. Selected Statistical Tools for Evidence-based Reasoning

  19. Selected Modeling Approaches for Diagnostic Assessments Approaches Resulting in Continuous Proficiency Scales • Unidimensional explanatory IRT or FA models (e.g., de Boeck & Wilson, 2004) 2. Multidimensional CTT sumscores (e.g., Henson, Templin, & Douglas, 2007) • Multidimensional explanatory IRT or FA models (e.g., Reckase, 2009) • Structural equation models (e.g., Kline, 2010) Approaches Resulting in Classifications of Respondents based on Discrete Scales 1. Bayesian inference networks (e.g., Almond, Williamson, Mislevy, & Yan, in press) • Parametric diagnostic classification models (e.g., Rupp, Templin, & Henson, 2010) • Non- / Semi-parametric classification approaches (e.g., Tatsuoka, 2009) 4. Adapted clustering algorithms (e.g., Nugent, Dean, & Ayers, 2010)

  20. Psychometric Tools for Diagnostic Assessments New frontiers of educational measurement 1.Educational data mining for simulation- / games-based assessment (e.g., Rupp et al., 2010; Soller & Stevens, 2007; West et al., 2009) 2. Diagnostic multiple-choice items / selected-response items (e.g., Briggs et al., 2006; de la Torre, 2009) 3.Computerized diagnostic adaptive assessment (e.g., Cheng, 2009; McGlohen & Chang, 2008) Useful ideas from large-scale assessment 1. Modeling dependencies in nested response data (e.g., Jiao, von Davier, & Wang, 2010; Wainer, Bradlow, & Wang, 2007) 2.Item families / task variants & automatic test / form assembly (e.g., Embretson & Daniel, 2008; Geerlings, Glas, & van der Linden, in press) 3. Survey designs using multiple test forms / booklets (e.g., Frey, Hartig, & Rupp, 2009; Rutkowski, Gonzalez, Joncas, & von Davier, 2010)

  21. Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: • A Modern Psychometric Perspective – André A. Rupp EDMS Department, University of Maryland 1230-A Benjamin Building College Park, MD 20742 Phone: (301) 405 – 3623 E-mail: ruppandr@umd.edu

  22. References (Part I) Almond, R. G., Williamson, D. M., Mislevy, R. J., & Yan, D. (in press). Bayes nets in educational assessment. New York: Springer. Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17, 191-204. Borsboom, D., & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 85–118). Cambridge, UK: Cambridge University Press. Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33-63. Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632. de Boeck, P., & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and nonlinear approach. New York: Springer. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163-183. Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem-solving items. Psychology Science Quarterly, 50, 328-344. Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement. Educational Measurement: Issues and Practice, 28(3), 39-53. Geerlings, H., Glas, C. A. W., & van der Linden, W. (in press). Modeling rule-based item generation. Psychometrika.

  23. References (Part II) Gomez, P. G., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24, 417-444. Haberman, S., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209-227. Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79-95. Jiao, H., von Davier, M., & Wang, S. (2010, April). Polytomous mixture Rasch testlet model. Presented at the annual meeting of the National Council for Measurement in Education, Denver, CO. Kane, M. T. (2006). Validation. In R L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Portsmouth, NH: Greenwood. Kline, R. (2010). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press. Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge, UK: Cambridge University Press. McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, 808-821. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749. Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–48). Mahwah, NJ: Erlbaum.

  24. References (Part III) Nugent, R., Dean, N., & Ayers, B. (2010, July). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers. Presented at the International Educational Data Mining Conference, Pittsburgh, PA. Reckase, M. (2009). Multidimensional item response theory. New York: Springer. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guildford Press. Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. Journal of Technology, Learning, & Assessment, 8(4). Available online at http://escholarship.bc.edu/jtla/vol8/4/ Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39, 142-151. Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY: Routledge. Stevens, R., Beal, C., & Sprang, M. (2009, August). Developing versatile automated assessments of scientific problem-solving. Presented at the NSF conference on games- and simulation-based assessment, Washington, DC. Templin, J., & Henson, R. (2009, April). Practical issues in using diagnostic estimates: Measuring the reliability and validity of diagnostic estimates. Presented at the annual meeting of the National Council of Measurement in Education, San Diego, CA. Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press. West, P., Rutstein, D. W., Mislevy, R. J., Liu, J., Levy, R., DiCerbo, K. E., et al. (2009, June). A Bayes net approach to modeling learning progressions and task performances. Paper presented at the Learning Progressions in Science conference, Iowa City, IA.

More Related