200 likes | 344 Views
Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments. CESC-SSHRC Symposium 2005 Jacqueline P. Leighton. Acknowledgments. Canadian Education Statistics Council (CESC) Social Sciences and Humanities Research Council (SSHRC)
E N D
Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments CESC-SSHRC Symposium 2005 Jacqueline P. Leighton
Acknowledgments • Canadian Education Statistics Council (CESC) • Social Sciences and Humanities Research Council (SSHRC) • Ms. Rebecca J. Gokiert, Ms. Ying Cui • CRAME colleagues
Overview • Rationale • Materials—SAIP Science 99 • Methods & Results • Phase 1 • Methods & Results • Phase 2 • Implications for Policy
Rationale • To identify the dimensional structure of the School Achievement Indicators Program (SAIP) Science Assessment • To find support (or not) for the view that science performance is associated with multiple and distinct thinking skills
Materials—SAIP Science 99 • A dichotomously scored two-stage test • Administered to students in both Grade 8 and Grade 11 (13- and 16-year-olds) • 6 content domains • 5 ability levels
Materials—SAIP Science 99 ROUTING TEST A TEST B TEST C TEST AB TEST AC
Method—Phase 1: Exploratory • Dimensionality test or DIMTEST (Stout et al., 2001) is a nonparametric procedure used to test the null hypothesis that a set of test data is unidimensional
Methods—Phase 1: Exploratory • EFA of the tetrachoric correlations was conducted, using 5 recommended decision rules • The factors retained were rotated using orthogonal rotation procedures (i.e., quartimax, varimax) and an oblique transformation procedure (i.e., direct oblimin)
Results—Phase 1: EFA • EFA Results • Decision rules indicated two factors • Oblique results interpreted because factors shared low to moderate correlations (range of .014 to .384)
Method—Phase 2: Confirmatory • Common shortcoming with EFA is the sparse description of the factors found to underlie the data (Haig, 2005) • For each item with a loading equal to or greater than 0.3, the following information was recorded: • First five to ten words of the test question, • Specific factor on which the item loaded • Content standard or objective • Ability level of the item
Methods—Phase 2: Confirmatory • Preliminary analyses of the AB and AC tests suggested that the two factors tapped student reasoning about causes and effects and student reasoning about category membership
Methods—Phase 2: Confirmatory • Recently published review article (2004) by Deanna Kuhn and David Dean Jr.. • In the ongoing process of managing and reducing the complexity of information from the external environment, individuals typically make use of two forms of inference—causal and non-causal
Methods—Phase 2: Confirmatory • SAIP items were reviewed and coded according to whether they contained primarily causal or categorical-type key words • We used key introductory words such as “why,” “how,” “cause/effect,” “what,” “which,” or “identify” to code items as either primarily causal or primarily categorical
Methods—Phase 2: Confirmatory • Influence of item format on students’ interpretation of the item as requiring causal versus categorical reasoning • SAIP items also coded according to item format • Format might function as a proxy for invoking either causal or categorical reasoning
Methods—Phase 2: Confirmatory • Linear factor analysis with LISREL to estimate the parameters for a 2-dimensional model associated with • the Causal-Categorical Model (CCM) • the Item Format Model (IFM) • Linear factor analysis to estimate the parameters for a 6-dimensional model using item coding associated with the Test Specifications Model (TSM)
Results—Phase 2: Confirmatory • Using recommended fit indices (Gierl & Rogers, 1996), none of the models fit the AB test data adequately • For the AC data, the IFM provided a consistently better fit than the CCM and TSM
Policy Implications • Multidimensional latent structure of the SAIP Science Assessment • Distinct forms of thinking in science • Sub-scores might be a better form of score reporting for SAIP and similar large-scale assessments
Policy Implications • Superiority of the Item Format Model in confirmatory factor analysis • Item format may function to elicit distinct forms of reasoning in science—causal and categorical
Policy Implications • Use of SAIP sub-scores to measure and gauge improvements in specific forms of reasoning in students • Test design and feedback that is focused on cognitive skills as well as content