520 likes | 1.05k Views
Unified View of Validity. ?Validity is an overall evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions on the basis of test scores or other modes of assessment" (Messick, 1995)Validity is a
E N D
1. Chapter 6 Validity
3. The Concept of Validity Face Validity
What a test ‘seems’ to measure
Transparency…
Honesty?
4. Tripartite View Content
Criterion-related
Construct
5. Content Validity The adequacy of stimulus sampling
The degree to which the contents of a test reflect the domain of interest
The extent to which one can generalize from a particular collection of items to all possible items in a broader domain of items
6. Content Validity
How is it established?
Stimulus sampling procedures
E.g., randomly sample items from domain
Rational analysis of test content during test development
By the test developer
“…it is surely impossible to write items in the first place without a conceptualization of the attribute of which they are to be indicators” (McDonald, 1999)
7. Content Validity
How is it established (contd.)?
By test users
E.g., expert sorts
Lawshe’s Content Validity Ratio
SMEs judge whether a test item is essential
CVR = (ne – N/2) / (N/2)
CVI = the average of all CVR item indexes
Evidence of internal consistency of items
Convergent validity correlations with other measures
8. Assertiveness
9. Content Validity
10. Content Validity Culture and the Relativity of Test Validity
History class example
11. Criterion-Related Validity What is a Criterion?
The standard against which a test or a test score is evaluated.
Characteristics of a criterion
Relevant
Valid
Uncontaminated
12. Contamination
13. Criterion-related Validity (aka Predictive Validity) Establishing that test scores relate to an external standard
Important but limited in scope:
“No amount of apparently sound theory can substitute for lack of a correlation between predictor and criterion” (p. 95, Nunnally & Bernstein, 1994)
“…predictive validity represents a very direct, simple, but limited issue in scientific generalization that concerns the extent to which one can generalize from scores on one variable to scores on another variable” (p. 99, Nunnally & Bernstein, 1994)
14. How is it established? Establish an empirical relation between predictor and criterion; the dreaded “validity coefficient”
Concurrent
Collect current criterion data
Predictive
Administer test (do not use the results to make decisions) and after a suitable period of time, collect criterion data
Incremental validity
Expectancy data
Postdictive
Collect past criterion data
15. Validity Coefficient
16. The Criterion Problem “Predictive validation accepts the criterion as a given, unlike construct validation” (p. 96, Nunnally & Bernstein)
Criterion reliability
Criterion deficiency
Criterion contamination
Range restriction
Study attrition
17. The Criterion Problem
18. Criterion-Related Validity Incremental validity
The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
19. Criterion-Related Validity Incremental validity
The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
20. Criterion-Related Validity Incremental validity
The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
21. Criterion-Related Validity Expectancy data
Expectancy table
22. Criterion-Related Validity Expectancy table
23. Criterion-Related Validity Taylor and Russell
the correlation between the test score and job performance
Validity coefficient
The base rate of success on the job
Given current selection measures
The selection ratio
Number of people to be hired vs applicants available
sss
24. Taylor Russell Tables See page 170, Table 6-3
Limitations
Must be a linear relationship
Identifying successful/unsuccessful criterion score
Naylor-Shine Tables
Test Utility Theory
25. Decision Theory and Test Utility Cronbach & Gleser (1965) presented:
A classification of decision problems
Various selection strategies ranging from single-stage processes to sequential analyses
A quantitative analysis of the relationship between test utility, the selection ration, cost of the testing program, and the expected value of the outcome
A recommendation that in some instance job requirements be tailored to the applicants ability instead of the other way around (adaptive treatment).
26. Decision Theory and Test Utility Base rate
The extent to which a particular trait, behavior, characteristic, or attribute exists in the population
Hit rate
The proportion of people a test accurately identifies as possessing the construct of interest
Miss rate
The proportion of people a test incorrectly identifies as possessing the construct of interest
False positive
False negative
27. Construct Validity Evidence that a variety of behaviors will correlate with one another in studies of individual differences and/or will be similarly affected by experimental manipulations
Construct validation is an obvious issue in scientific generalization
The measure must show expected patterns of relations with other variables (nomological net)
interpretations of test scores be similar if other measures of the construct were used? How trustworthy are score interpretations?
28. 3 Major Aspects of Construct Validity Specify the domain of observables related to the construct
Test empirically the relation between observables
Perform individual differences studies and/or experiments to determine the extent to which measures are consistent with a priori hypotheses
29. How is it established? Internal analysis of item or subtest relationships
Item analysis
Homogeneity
Use of factor analysis
EFA/CFA
Factor Loading/Identifying Factors
Predictive validation designs
Group differences (known groups)
30. MTMMs
31. How is it established?
Correlations between measures
Convergent & discriminant validity
MTMMs
Changes
Over time
After experimental intervention
32. Validity and Test Bias The Definition of “Test Bias”
Bias…
A factor inherent in a test that systematically prevents accurate impartial measurement
Random vs systematic variation
33. Test Bias Eye color example
Three characteristics of regression lines
The slope
The intercept
The error of the estimate
34. Slope Bias
35. Intercept Bias
36. Error of the Estimate
37. What does this tell us?
38. Bias Design of the research study
Less minority participants
39. Validity and Test Bias Rating Error
A judgment resulting from the intentional or unintentional misuse of a rating scale
Leniency
Severity
Central tendency
Rankings
A procedure that requires the rater to measure individuals against one another
40. Validity and Test Bias Rating Error
Halo effect
A tendency to give a particular ratee a ‘different’ rating than he/she objectively deserves, because of the rater’s failure to discriminate among conceptual distinct and potentially independent aspects of a ratee’s behaviors
41. Halo Effect
42. Validity and Test Bias Test bias – statistical considerations
Test fairness
The extent to which a test is used in an impartial, just, and equitable way
43. Validity and Test Bias Common misunderstandings regarding fairness
Unfair because test ‘discriminates’ amongst individuals???
Particular sample group not included in validation process?
Bias found
44. Guion’s View According to Guion, construct validity is really the only kind of validity
The logic of construct validity is evidenced throughout; must be:
Reliable
Characterized by good stimuli
Be of interest; must show expected patterns of relations with other constructs
9 questions to ask in evaluating tests (Guion, 1998)
45. Questions in Test Development 1. “Did the developer of the procedure have a clear idea of the attribute to be measured?”
Boundaries of the attribute?
Behaviors that exhibit those attributes and those that do not?
Variables that it would be correlated with and those that it would not?
“Are the mechanics of measurement consistent with the concept?”
Appropriateness of:
Presentation medium
Rules of standardization (e.g., time limits)
Response requirements
46. 3. “Is the stimulus content appropriate?”
Requirements for content sampling alone to justify test use:
Content must be behavior that has a generally accepted meaning
Domain must be defined unambiguously
Domain must be directly relevant to measurement purpose (sample v. sign ala Wernimont & Campbell, 1968)
Qualified judges must agree that the domain was properly sampled
Responses must be scored & evaluated reliably
4. “Was the test carefully and skillfully developed?”
“…I look for evidence that the plan was carried out well. The evidence depends on the plan.”
Use of pilot tests?
Based on appropriate item analysis?
47. Evidence Based on Reliability 5. “Is the internal statistical evidence satisfactory?”
Internal consistency, internal completeness (or relevance)
In CTT, discrimination & difficulty indices
6. “Are scores stable over time and consistent with alternative measures?”
Test-retest, alternate forms, interrater agreement, depending on the purpose/use of the test
Bear in mind that, “…consistency may be due to consistent error.”
48. Evidence from Patterns of Correlates “Does empirical evidence confirm logically expected relationships with other variables?”
Failure to support hypotheses casts doubt on:
The validity of the inference or
The conceptual & operational definitions of the attribute
“Does empirical evidence disconfirm alternative meanings of test scores?”
Cronbach’s “strong program” of construct validation
Provide an explicit theory of the attribute
Identify & evaluate plausible rival inferences
49. Evidence Based on Outcomes 9. “Are the consequences of test use consistent with the meaning of the construct being measured?”
50. Messick’s View Validity as a unified concept:
Validity cannot rely on any one form of evidence
Validity does not require any one form of evidence
Validity applies to all assessments
Validity judgments are value judgments
Expose values underlying tests by pitting pros and cons of test use against alternatives
“…unintended consequences, when they occur, are also strands in the construct’s nomological network that need to be taken into account in construct theory, score interpretation, and test use” (p.744)
51. Messick’s Views (continued) Two major threats to construct validity:
Construct underrepresentation
Construct irrelevant variance
Construct-irrelevant difficulty
Construct-irrelevant easiness
“…any negative impact on individuals or groups should not derive from any source of test invalidity, such as construct underrepresentation or construct-irrelevant variance” (p. 746)
52. Messick’s Views (continued) 6 aspects of construct validity:
Content
representativeness
Substantive
theoretical rationale & findings
Structural
link between scoring structure to domain
Generalizability
of interpretations across groups, settings, and tasks
External
convergent & discriminant evidence
Consequential
Value implications of score interpretations as a basis for action & the actual & potential consequences of test use