Validity and Validation: An introduction

Validity and Validation: An introduction Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file to your computer, then look at it in the “normal view”

Conceptions of Validity • Does this test measure what it is supposed to measure? • Does the measure correlate with other (perhaps more expensive or invasive) measures of the concept? • Validity is present when the attribute to be measured causes variations in the measurement outcomes • What interpretations can fairly be placed on scores produced by this measurement? • How well does this measure fit in with a hypothesized network of related, and contrasting, constructs?

Reliability and Validity Reliability LowHigh Biasedresult! • • • • • • • • • Validity Low • • • • • • ☺ • High • • • • • • • Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work •

Validity viewed as error in measurement. Two types of error in repeated measurements individual measures Randomerror ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ True value(which we are trying to record) Bias! ¦ ¦ ¦ ¦ ¦ ¦ ¦

The idea of Bias: Measured versus Self-Reported Height and Weight. 2005 Canadian Community Health Survey (N=4,080 adults 18+) A Bias: 1 cm taller 2 kg lighter B Bias: BMI underestimatedby 1 unit C Bias: 10% underestimatein obesity 6% underestimate

First steps in Validation: the design stage “Does it measure what it’s supposed to measure?” • First, decide and define what you want to measure (if it’s an abstract concept, explain the conceptual basis) • Select indicators or “items” that represent the topic (this involves content validity: a sampling of potential questions) • Check that items are clear, comprehensible and relevant (face validity, “sensibility”) • This produces a pool of items ready for the item analysis stage, which involves administering the test and analyzing responses (next slide).

Clarity of question wording • Most questionnaires are written by educated people, and they tend to use long and complicated words that may not be understood by respondents with less language ability. • ‘Cognitive testing’ should be a component of developing any questionnaire. E.g., discuss with the respondent in a pilot testing session how he/she understood the question. • Checking the language using a ‘readability formula’ may be helpful: • Dale-Chall formula; Flesch Reading Ease; Fog Index; Fry Graph • These often show that questionnaires (and consent forms!) are too difficult for many people to understand.

Second steps in validation: field testing • Administer your measurement to a sample of the types of person for whom it will be used: • Typically a pre-test on selected respondents, to ensure the questions are clear • Often using interviews, followed by qualitative de-briefing interviews with respondents • May lead to re-wording questions, changing procedures, etc. • Field testing on larger sample • Data collected and analyzed, both for the scale as a whole to test overall performance, and item by item (item analysis) to identify items that are not performing. • Analyses of items and overall performance may be done in any sequence, often with re-analyses discarding items that do not perform.

Item analyses: Checking the internal structure Item analysis refers to a series of checks on the performance of each “item” (e.g. question). Some analyses fall under the heading of reliability, some validity. Faulty items are discarded or replaced. Item analyses include: • Item distributions & missing values: an item that does not vary or that people don’t answer cannot measure anything • Internal structure: correlations among items, maybe leading to factor analysis. Factorial validity • Item response theory (IRT) analyses checks contributions of each item to the overall scale.

Performance of the overall test:External associations Compare the scores against a criterion, if a “gold standard” exists. Criterion validity: sensitivity & specificity are the normal statistics. Compare against other measures: concurrent validation. Correlations often divided into convergent and discriminant coefficients, according to hypothesized associations. Comparisons against a set of other indicators leads to construct validation. You begin from hypotheses covering the expected correlations with as wide a range of indicators as possible.

Performance of the overall test:Group Discrimination Once you show that scores correlate with other measures as intended, its performance is evaluated in contrasting between different groups of respondents. “Known groups validation” (can it distinguish well from sick people? Similar to criterion validity) Responsiveness to change: sensitivity to change over time (this is important in an evaluative measure) Do scores show ceiling or floor effects? If it is not performing as expected, return to item analyses to diagnose where the weakness lies. This may lead to redesign and further field testing.

Comments • Validation is rarely complete. Many instruments continue to be checked for validity years after their development. Times change, phrasing makes old items obsolete, and you can also test validity for different purposes. • Validation is long and expensive. Basic test development and validation may take 3 - 5 years: it’s not a thesis project. • Remember: validity is about the interpretation of scores. It is a relative concept: a test is not valid or invalid, but only valid or not for a particular application. • But recall the Viagra principle: a test intended for one purpose may prove good for an unanticipated application.

Validity and Validation: An introduction