Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments

Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments CESC-SSHRC Symposium 2005 Jacqueline P. Leighton

Acknowledgments • Canadian Education Statistics Council (CESC) • Social Sciences and Humanities Research Council (SSHRC) • Ms. Rebecca J. Gokiert, Ms. Ying Cui • CRAME colleagues

Overview • Rationale • Materials—SAIP Science 99 • Methods & Results • Phase 1 • Methods & Results • Phase 2 • Implications for Policy

Rationale • To identify the dimensional structure of the School Achievement Indicators Program (SAIP) Science Assessment • To find support (or not) for the view that science performance is associated with multiple and distinct thinking skills

Materials—SAIP Science 99 • A dichotomously scored two-stage test • Administered to students in both Grade 8 and Grade 11 (13- and 16-year-olds) • 6 content domains • 5 ability levels

Materials—SAIP Science 99 ROUTING TEST A TEST B TEST C TEST AB TEST AC

Method—Phase 1: Exploratory • Dimensionality test or DIMTEST (Stout et al., 2001) is a nonparametric procedure used to test the null hypothesis that a set of test data is unidimensional

Methods—Phase 1: Exploratory • EFA of the tetrachoric correlations was conducted, using 5 recommended decision rules • The factors retained were rotated using orthogonal rotation procedures (i.e., quartimax, varimax) and an oblique transformation procedure (i.e., direct oblimin)

Results—Phase 1: DIMTEST

Results—Phase 1: EFA • EFA Results • Decision rules indicated two factors • Oblique results interpreted because factors shared low to moderate correlations (range of .014 to .384)

Method—Phase 2: Confirmatory • Common shortcoming with EFA is the sparse description of the factors found to underlie the data (Haig, 2005) • For each item with a loading equal to or greater than 0.3, the following information was recorded: • First five to ten words of the test question, • Specific factor on which the item loaded • Content standard or objective • Ability level of the item

Methods—Phase 2: Confirmatory • Preliminary analyses of the AB and AC tests suggested that the two factors tapped student reasoning about causes and effects and student reasoning about category membership

Methods—Phase 2: Confirmatory • Recently published review article (2004) by Deanna Kuhn and David Dean Jr.. • In the ongoing process of managing and reducing the complexity of information from the external environment, individuals typically make use of two forms of inference—causal and non-causal

Methods—Phase 2: Confirmatory • SAIP items were reviewed and coded according to whether they contained primarily causal or categorical-type key words • We used key introductory words such as “why,” “how,” “cause/effect,” “what,” “which,” or “identify” to code items as either primarily causal or primarily categorical

Methods—Phase 2: Confirmatory • Influence of item format on students’ interpretation of the item as requiring causal versus categorical reasoning • SAIP items also coded according to item format • Format might function as a proxy for invoking either causal or categorical reasoning

Methods—Phase 2: Confirmatory • Linear factor analysis with LISREL to estimate the parameters for a 2-dimensional model associated with • the Causal-Categorical Model (CCM) • the Item Format Model (IFM) • Linear factor analysis to estimate the parameters for a 6-dimensional model using item coding associated with the Test Specifications Model (TSM)

Results—Phase 2: Confirmatory • Using recommended fit indices (Gierl & Rogers, 1996), none of the models fit the AB test data adequately • For the AC data, the IFM provided a consistently better fit than the CCM and TSM

Policy Implications • Multidimensional latent structure of the SAIP Science Assessment • Distinct forms of thinking in science • Sub-scores might be a better form of score reporting for SAIP and similar large-scale assessments

Policy Implications • Superiority of the Item Format Model in confirmatory factor analysis • Item format may function to elicit distinct forms of reasoning in science—causal and categorical

Policy Implications • Use of SAIP sub-scores to measure and gauge improvements in specific forms of reasoning in students • Test design and feedback that is focused on cognitive skills as well as content

Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments

Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments

Presentation Transcript

Book 3: Use of Accommodations in Large-Scale Assessments

The Statistical Properties of Large Scale Structure

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Large Scale Demonstrations of Nanoscale Science

LARGE - SCALE ASSESSMENTS

Investigating science

Issues relating to Large-scale Assessments

Cognitive Dimensions

Challenges in International Large-Scale Assessments

Large-Scale Science Through Workflow Management

Cognitive Dimensions

Large Extra Dimensions

SWIRE Science: Investigating the Active and Passive Universe on Large Scales

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Large Extra Dimensions

The Statistical Properties of Large Scale Structure

Challenges in International Large-Scale Assessments

The Future of Large Scale Assessments: Embracing Technology and AI