370 likes | 1.22k Views
Lee Cronbach and the Evolving Concept of Validity. Khusro Kidwai Instructional Systems. Agenda. Cronbach’s Contributions Pop Quiz Validity over the years Alternative models of validity Cronbach – “Soft science” view of validity Mellenbergh and Heerden – “Back to basics”
E N D
Lee Cronbach and theEvolving Concept of Validity Khusro Kidwai Instructional Systems
Agenda • Cronbach’s Contributions • Pop Quiz • Validity over the years • Alternative models of validity • Cronbach – “Soft science” view of validity • Mellenbergh and Heerden – “Back to basics” • Kane – Argument based approach to validity • Debate
Bandura Pearson Fischer Montessori Thorndike Lee J. Cronbach (1916 - 2001) William James Spearman Vygotsky Binet
What’s the connection? 1916: Lewis Terman released the Stanford-Binet (the year Cornbach was born) 1921: Cronbach took the test and scored 200. He was enrolled in the Terman gifted program. ? Binet Lee J. Cronbach
Cronbach’s Contributions:Psychological Testing, Educational Psychology and Program Evaluation Cronbach’s Alpha: Cronbach solved the problem of calculating the reliability of the test when you only take it once - repeated occasions or parallel forms of a test were not no longer needed.
Cronbach’s Contributions:Psychological Testing, Educational Psychology and Program Evaluation Generalizability Theory: What started out to be a handbook on measurement with Goldine Gleser became a major re-conceptualization of reliability theory in the form of Generalizability Theory. Note: Sampling Model of Validity uses G-Theory
Cronbach’s Contributions:Psychological Testing, Educational Psychology and Program Evaluation Construct Validity: The Cronbach and Meehl paper, “Construct Validity in Psychological Tests,” (1955) laid the groundwork for fifty years of work on validity.
Cronbach’s Contributions:Psychological Testing, Educational Psychology and Program Evaluation Aptitude Treatment Interaction: Cronbach’s worked with Dick Snow on aptitude-treatment interactions (ATIs) for over ten years. This work led him to question and eventually re-conceptualize his approach to psychological measurement.
Cronbach’s Contributions:Psychological Testing, Educational Psychology and Program Evaluation Program Evaluation: “He was one of the early, early researchers to recognize that evaluation is an important part of education. At that time, we had a lot of federal programs and we didn’t know if they were effective.” (Olson) ”Designing Educational Evaluations” was selected as one of the top one hundred education-related “Books of the 20th Century” by the Museum of Education, University of South Carolina, 2000).
Instructions: Match the definitions and examples to the type of RELIABILITY or VALIDITY. Some items will have more than one matching response. Some responses can be used for more than one item.
Validity Over the YearsIn the beginning… Validity as a simple statistical correlation • The correlation coefficient determines the validity (Hull, 1928). • A test is valid for anything with which it correlates (Guilford, 1946) Question: Height and weight scores are highly correlated. Does this mean that we can [validly] use a scale to measure weight? • Validity is the correlation of test scores with some other objective measure of that which the test is used to measure (Bingham, 1937). • The validity of a test is the correlation of the test with some criterion (Gulliksen, 1950) • Validity of a test is the extent to which the test measures what it purports to measure. (Garrett, 1937)
Validity Over the YearsMultiple forms of validity Criterion Validity: The validity of a test is the correlation of the test with some criterion (Gulliksen, 1950) 1. Concurrent Validity: Eg. If I want to validate a self-concept scale that I have constructed, I administer it along with an already established self-concept scale. Then check the correlation of the scores obtained on the two tests. 2. Predictive Validity: Eg. Postal clerks admissions test. If the scores on the exam really represent postal knowledge and skills then, these scores should correlate with future on-the-job evaluations.
Validity Over the YearsMultiple forms of validity Content Validity:When a test is reviewed by subject-matter experts who verify that its content represents a satisfactory sampling of the domain, the test is obviously valid (Rulon, 1946) Eg. Researchers aim to study mathematical learning and create a survey to test for mathematical skill. If these researchers only tested for multiplication and then drew conclusions from that survey, their study would not show content validity because it excludes other mathematical functions.
Validity Over the YearsCronbach’s contribution Construct Validity: Construct validity is based on accumulation of research results: formulate hypotheses, test hypotheses. (Cronbach, 1955) Math Instruction If scores on my test are related to scores on a math ability test, then the scores on my test should also be related to scores on tests for other concepts related to math ability (eg. Math instruction, gender, SES etc) Math Ability SES GENDER AGE My Test
Validity Over the YearsCronbach’s contribution Preponderance of Evidence Approach (Positivistic): Cronbach saw validation as a process of theory building and testing. Validation, a never-ending process. Moreover, what was validated, according to Cronbach, was not the test itself, for a test could be used for many purposes (e.g., prediction, diagnosis, placement). Rather, what was validated was a proposed interpretation.
Validity Over the YearsThe Standards • 1954 - Four types of validity: predictive, concurrent, construct, content • 1966 - Three types of validity: criterion-related, construct, content • 1974 – 1966 + we should consider issues of adverse impact and test bias. • 1985 – 1974 + the test user should know the purposes of the testing and the probable consequences.
Validity Over the YearsThe Standards Unitary Concept of Validity - 1985 Standards: Validity is the appropriateness, meaningfulness, and usefulness of the specific inferences from test scores. There are numerous validation processes to accumulate evidence. Some of these are provided by developers, others gathered by users. ... Traditionally, the various means of accumulating validity evidence have been grouped into categories called content-related, criterion-related, and construct-related evidence of validity. These categories are convenient, ... but the use of the category labels does not imply that there are distinct types of validity or that a specific validation strategy is best for each specific inference or test use. (Standards, 1985)
Validity Over the YearsMessick’s model (1989) Most important to Least Important: 1. Construct validity 2. Value implications 3. Utility and predictive validity 4. Social consequences
Validity Over the YearsCriticism of Messick’s model • Messick has gone too far. Consequences overburden the concept of validity. We should return to construct validity. (Wiley, McGuire, etc.) • Messick didn't go far enough. We should reverse the hierarchy. Consequences are most important and should come first. Construct validity is important but only secondary. (Shepard)
Validity Over the YearsThe Standards Most Recent – 1999 Standards: Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. ... The process of validation involves accumulating evidence to provide a sound scientific basis for the proposed score interpretations. It is the interpretations of test scores required by proposed uses that are evaluated, not the test itself. When test scores are used or interpreted in more than one way, each intended interpretation must be validated. (Sources of validity evidence include but not limited to: evidence based on test content, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, evidence based on consequences of testing.)
Alternative Models of Validity 1. Cronbach – “Soft science” view of validity 2. Mellenbergh and Heerden – “Back to basics” 3. Kane – Argument based approach to validity
Cronbach’s evolving view 1955 (“Construct Validity in Psychological Tests”, Cronbach & Meehl): Positivist Construct validity was conceptualized in a strictly scientific way. “Construct validity is based on accumulation of research results: formulate hypotheses, test hypotheses…” 1975 (“Beyond the Two Disciplines of Scientific Psychology”, Cronbach): Tentative Complexity of research in Aptitude Treatment Interaction made Cronbach realize that science took him just so far… He came to realize the contribution that “other ways of knowing” had to make in understanding teaching and learning, and human action more generally. “The experimental strategy dominant in psychology since 1950 has only limited ability ...” “…too narrow an identification in science, however, has fixed our eyes upon an inappropriate goal.”
Cronbach’s evolving view (contd.) 1975 (“Beyond the Two Disciplines of Scientific Psychology”, Cronbach • “The 25-year old research supporting a test’s construct validity gives us little warrant for interpreting scores today because with new times the items carry new implications.” • “There is no such thing as enduring theoretical structures in social science. Generalizations in social science decay…This puts construct validation into a new light. Because Meehl and I were importing into psychology a rationale developed out of physical science, we spoke as if a fixed reality is to accounted for. ” • “When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion.”
Mellenbergh and Heerden – Back to Basics! Exisiting View: Content, Criterion and Construct + social consequences Mellenbergh and Heerden note that validity theory has come to treat every important test related issue as related to the validity concept and aims to integrate all these issues under a single header. A test is valid for measuring a construct if: a. the construct exists b. variations in the construct causally produce variations in the outcomes of the measurement procedure. Eg. Spatial Ability Test This is equivalent to asking the question that Garrett asked in 1937: “Whether a test measures what it should measure” Garrett’s 1937 definition: Validity of a test is the extent to which the test measures what it purports to measure.