310 likes | 552 Views
Validity - C onsequantialism. Assoc. Prof. Dr. Sehnaz Sahinkarakas. “ Effect-driven testing ” ( Fulcher & Davidson, 2007) “the effect that the test is intended to have and to structure the test development to achieve that effect” (p.144) What does this mean?. Definition of VALIDITY.
E N D
Validity - Consequantialism Assoc. Prof. Dr. SehnazSahinkarakas
“Effect-driven testing” (Fulcher & Davidson, 2007) • “the effect that the test is intended to have and to structure the test development to achieve that effect” (p.144) • What does this mean?
Definition of VALIDITY • “Overall judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions on the basis of test scores or other modes of assessment” (Messick, 1995, p. 741). • What is score? • In general it is “any coding or summarization of observed consistencies or performance regularities on a test, questionnaire, observation procedure, or other assessment devices such as work samples, portfolios, and realistic problem simulations” (p. 741).
Then validity is making inferences about scores; scores are the reflections of a test taker’s knowledge and/or skills based on test tasks. • Different from early definitions of validity: the degree of correlation between the test and the criterion (validity coefficient) • In early definition: • there is an upper limit for the possible correlation • it is directly related to the reliability of the test (without high reliability a test cannot be valid) • New definition (especially after Messicks), validity changed as the meaning of the test scores, not a property
Final remarksforvalidity(and reliability, fairness…): • not based on just measurement principles; • they are social values • correlation coefficients and/or content validity analysis are not enough to assume validity (Messick). • So, “score validation is an empirical evaluation of the meaning and consequences of measurement” (Messick)
Construct Validity • What is construct? • To define a concept in such a way that • it becomes measureable (operational definition) • it can have relationship with other different constructs (e.g. the more anxious, the less self-confidence) • Construct validity • is the degree to which inferences can be made from the operational definitions to theoretical constructs those definitions are based • What does this mean?
Two things to consider in construct validation: • Theory (what goes on in our mind: ideas, theories, beliefs…) • Observation (what we see happening around us; our actual program/treatment) • i.e., we develop something (observation) to reflect what is in our mind (theory) • Construct validity is assessing how well we have transformed our ideas/theories to our actual programs/measures • What does this mean in testing? How do we do it in testing?
Sources of Invalidity • Two major threats: • Construct underrepresentation: assessment is too narrow: does not include important dimensions of the construct • Construct-irrelevant variance: assessment is too broad: contains variance associated with other distinct constructs
Construct-Irrelevant Variable • Two kinds • Construct-irrelevant difficulty (e.g., undue reading text based on subject-matter knowledge): leads to invalid low scores • Construct-irrelevant easiness (e.g., highly familiar texts to some): leads to invalid high scores • What do you think about KPDS/YDS in terms of threats to validity
Sources of Evidence in ConstructValidity (Messick, 1995) • ConstructValidity= theevidentialbasisforscoreinterpretation • How do weinterpretscores? • Anyscoreinterpretation is needed, not just ‘theoreticalconstructs’ • How do we do this?
Evidence-RelatedValidity • Twotypes: • Convergent validity consists of providing evidence that two tests that are believed to measure closely related skills or types of knowledge correlate strongly. (i.e. The test MEASURES what it clasimstomeasure) • Discriminant validity consists of providing evidence that two tests that do not measure closely related skills or types of knowledge do not correlate strongly. (i.e. The test does NOT MEASURE irrelevantattributes)
Aspects of ConstructValidity • Validity is a unifiedconcept but it can be differentiatedintodistinctaspects: • Content • Substantive • Structural • Generalizability • External • Consequential
Content Aspect • Content relevance; Representativeness; Technical quality(towhatextentdoes it representthe domain?) • Itrequiresidentifyingtheconstruct DOMAIN to be assessed • Towhatextentdoesthe domain/taskcovertheconstruct • Allimportantparts of theconstruct domain should be covered
SubstantiveAspect • Theprocess of theconstructandthedegreetheseprocessesarereflected • Itincludescontentaspect in it but empiricalevidence is alsoneeded. • This can be done using a variety of sources; e.g. think-aloudprotocols
Theconceptbridgingcontentandsubstantive is representativeness. • Representativeness has twodistinctmeanings: • Mentalrepresentation (cognitivepsyhchology) • Brunswinkian sense of ecologicalsampling: correlationbetween a cueand a property. (e.g. Color of banana is a cueand it indicatestheripeness of thefruit)
StructuralAspect • Relatedtoscoring • Thescoringcriteriaandrubricsshould be rationallydeveloped (based on theconstructs)
Generalizability • Interpretationsshould not be limitedtothetaskassessed • Should be generalizabletotheconstruct domain • (degree of correlationbetweenthetaskandtheothers)
ExternalVariables • Scores’ relationshipwithothermeasuresandnonassessmentbehaviours • Convergent (correspondencebetweenmeasures of thesameconstruct) andDiscriminantevidence (distinctnessfrommeasures of otherconstructs) areimportant
Consequences • Evaluatingintendedandunintendedconsequences of scoreinterpretationbothpositiveandnegativeimpact • But, negativeimpactshould NOT be because of theconstructunderrepresentationorconstructirrelevantvariance. • Twofacets: (a) justification of thetestingbased on scoremeaningorconsequencescontributingtoscorevaluation; (b) functionoroutcome of thetesting—as interpretaionorapplieduse
Facets of Validity as a ProgressiveMatrix (Messicks, 1995, p. 748) Twofacets: (a) justification of thetestingbased on scoremeaningorconsequencescontributingtoscorevaluation; (b) functionoroutcome of thetesting—as interpretaionorapplieduse. Whentheyarecrossedwitheachother a four-foldclassification is obtained
Construct validity appears in every cell in the figure. • This means: • Validity issues are unified into a unitary concept • But also distinct features of construct validity should be emphasized • What is the implication here? • Both meaning and values are interwined in the validation process. • Thus, • ‘Validity and values are one imperative, not two, and test validation implicates both the science and the ethics of assessment, which is why validity has force as a social value’ (Messick, 1995, p. 749).
ConsequentialValidity & Washback • Messicianview (Unifiedversion) of ConstructValidity= Consideringtheconsequences of test use(i.e., washback) • Whatdoesthismean in validitystudies?
Washbackis a particularinstance of consequentialaspect of constructvalidity • Investigatingwashbackandotherconsequences is a crucial step in theprocess of test validation • i.e., Washbackis one (not theonly) indicator of consequentialaspect of validity • Itis importanttoinvestigatewashbacktoestablishthevalidity of a test
Put it differently: • Modern paradigm of validitycomeswithitsconsequentialnature • Test impact is part of a validationargument • Thus, effect-driventestingshould be considered: testersshouldbuildtestswiththeintendedeffects in mind
To put it alltogether Value implication + Socialconsequences = CONSEQUENTIAL VALIDITY (twofairness-relatedelements of Messick’sconsequentialvalidity)
But whobringsaboutwashback (positiveornegative)? • People in classrooms (T / Ss)? • Test Developers? • ForFulcherandDavidson, it is thepeople in classrooms • Thusmoreattentionshould be giventoteachers’ beliefsaboutteachingandlearningandthedegree of their PROFESSIONALISM
Task A9.2 • Course book (p.143) • Select onelarge-scale test youarefamiliarwith. • What is itsinfluenceuponwhom? • Does it seemreasonableto define thesetests as theirinfluence as well?