240 likes | 269 Views
On-demand learning-embedded benchmark assessment using classroom-accessible technology. Discussant Remarks: Mark Wilson UC, Berkeley. Outline. What does “Validity” look like for these papers? What is it that these papers are distinguishing themselves from? Where might one go from here?.
E N D
On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley
Outline • What does “Validity” look like for these papers? • What is it that these papers are distinguishing themselves from? • Where might one go from here?
Need for strong concern about validity • Effect of NCLB requirements: • Schools are instituting frequent “benchmark” tests • Intended to guide teachers as to students strengths abd weaknesses • Often just little copies of the “State test” • Teachers are complaining that it puts a vice-like grip on the curriculum
Validity • 1999 AERA/APA/NCME Standards for educational and psychological tests • Five types of validity evidence: • Evidence based on test content • Evidence based on response processes • Evidence based on internal structure • Evidence based on external structure • Evidence based on consequences
Paper 1: Falmange et al-ALEKS • Reliability => Validity • “the collection of all the problems potentially used in any assessment represents a fully comprehensive coverage of a particular curriculum, ..[hence]...[a]rguing that such an assessment, if it is reliable, is also automatically endowed with a corresponding amount of validity is plausible.”
Paper 1: Falmange et al-ALEKS • Test content • Theory of the Learning Space • “inner fringe” and “outer fringe” • “the summary is meaningful for an instructor” • Database of Problems • “a consensus among educators that the database of problems is a comprehensive compendium for testing the mastery of a scholarly subject. This phase is relatively straightforward.” • Evidence: Who were the experts?/What did they do?/How much did they agree?
Paper 1: Falmange et al-ALEKS • Evidence based on response processes • E.g., for selected K, Do students in K say things that are consistent/inconsistent with that • Evidence based on internal structure • E.g., for selected K, Do students in K have high/low success rates at “instances” in K • Evidence based on external structure • E.g., comparison with teacher judgments of student ability • Evidence based on consequences • E.g., use of “fringes”…does this help/hinder teacher interpretations
Paper 2: Shute et al-ACED • Two “validity studies” • Study 1: Evidence based on external structure: • Prediction of residuals from external post-test after controlling for pre-test • Informative design on conditions: elaborated feedback better • Study 2: Evidence based on response processes • “Usability” study for students with disabilities
Paper 2: Shute et al-ACED • Evidence based on test content • reference to earlier paper • Evidence based on internal structure • Could easily be investigated, as there is interesting internal structure (Fig. 1) • Evidence based on consequences • Probably not any real consequences yet
Paper 3: Heffernan et al -ASSISTment System • Evidence based on test content • Items coded by: 2 experts, 7 hrs. • “skill of Venn Diagram” • Evidence based on internal structure • Which skill-model fits best--1, 5, 39, 106 skills? • Which number is different? • 4.10, 4.11, 4.12, 4.10, 4.10 • 1, 5, 39, 106 (twice)
Paper 3: Heffernan et al -ASSISTment System • Evidence based on external structure • Prediction of MCAS 23/38 = 61% don’t fit well for the “best” model (WPI-39 (B)).
Paper 3: Heffernan et al -ASSISTment System • Evidence based on response processes • ? • Evidence based on consequences • Probably are real consequences
Paper 4: Junker-ASSISTment System • Two “Validity studies” • Study 1: Evidence based on external structure • Prediction of MCAS scores • Study 2: Evidence based on internal structure • 4 internal structure patterns • 2 questions • Q1: Regarding how scaffolds get easier--what happens when you get a scaffold wrong? • Q2: What about the gap?
Paper 4: Junker-ASSISTment System • Rest of types of validity--see Paper 3
Looking Beyond • What does this group of papers have to offer? • What should it be looking out for?
Paper 1: Falmange et al-ALEKS • Inner and Outer Fringe • What do teachers think of them, what do they do with them? • “Standardized tests,” “psychometrics” as straw men • Alternative: compare ones work to the latest developments in item response modeling (e.g., EIRM)
Paper 2: Shute et al-ACED • “Weight of Evidence” • Good alternative to Fisher information • Transparent, easily interpretable • Models for people with disabilities • Most likely going to have different internal structure • Need to develop broader view of internal structure criteria
Paper 3: Heffernan et al -ASSISTment System • MCAS as starting point for diagnostic testing? • Using released items?!? • What is “unidimensionality”
Paper 3: Heffernan et al -ASSISTment System • In a latent class model, the latent class looks like this: • In an item response model (e.g., Rasch model), unidimensionality looks like this: … See: Karelitz, T.M., Wilson, M.R., & Draney, K.L. (2005). Diagnostic Assessment using Continuous vs. Discrete Ability Models. Paper presented at the NCME Annual Meeting in San Francisco, CA.
Paper 4: Junker-ASSISTment System • What is the effect of assuming MCAR/MAR assumptions when neither is true? • Relevant to all CAT • Or of assuming you know the response under NMAR • Is there a discrimination paradox in DINA models? • Why do scaffold questions get easier?
Future Directions • What is a “Knowledge State” (KS) • How do we test if it’s a unitary thing? • What if it isn’t? • Mixture models--structured KSs • Do teachers (and other practitioners) find the KSs useful • How to adjust if they don’t? • finer/coarser grained • structured