Measuring Research Variables

Measuring Research Variables KNES 510 Research Methods in Kinesiology

Evidence of Validity The American Educational Research Association and American Psychological Association agree on the definition of four types of validity: Logical Validity Content Validity Criterion Validity Concurrent Validity Predictive Validity Construct Validity

Logical or Face Validity What is logical or face validity? Logical validity is determined subjectively What are some examples of logical validity?

Content Validity What is content validity? A test which has content validity should adequately measure the skills and/or material which has been presented in class

Concurrent Validity What is concurrent validity? Determined by correlating scores on a test with scores on a criterion measure The resulting correlation coefficient is called a validity coefficient

Examples of Concurrent Validity VO2 max (criterion: oxygen consumption) Distance runs (e.g., 1.0-mile, 1.5-mile, 9-minute, 12-minute, 20-minute shuttle) Submaximal (e.g., cycle, treadmill, swimming) Nonexercise models (e.g. self-reported physical activity)

Body fat (criterion: hydrostatically determined body fat) Skinfolds Anthropometric measures Sport skills (criterion: game performance, expert ratings) Sport skills tests

Predictive Validity What is predictive validity? When, and why, are we interested in determining predictive validity?

Examples of Predictive Validity Heart disease (criterion: heart disease developed in later life) Present diet, exercise behaviors, blood pressure, family history Success in graduate school (criterion: grade-point average or graduation status) Graduate Record Examination scores Undergraduate grade-point average

Job capabilities (criterion: successful job performance) Physical abilities Cognitive abilities Predictive validity is established by correlating the scores on a test with scores on another test in the future

Construct Validity What is construct validity? Construct validity is used with abstract rather than concrete tests An abstract test measures something that is not directly observable

Examples of abstract measures: Attitudes Personality characteristics Other unobservable yet theoretically existing traits

Construct validity is established by finding two groups known to differ on the variable, or construct, being tested The test is then administered to both groups to determine if there is a significant difference between the scores for the two groups This is the known group difference method

Reliability Reliability – the degree of consistency with which a test measures what it measures In order to be valid, a test must also be reliable Observed score = True score + Error score

Types of Reliability Stability Reliability – the degree to which an individual's scores are unchanged from day to day We use the test-retest method to obtain the stability reliability coefficient

Each person is measured with the same test or instrument on several (usually 2) different days (Day 1, Day 2, and so on) The correlation between the two sets of scores is the stability reliability coefficient The closer this coefficient is to positive one (+1.0), the more stable and reliable the scores

Three factors can contribute to poor score stability (a low stability reliability coefficient): the people tested may perform differently the measuring instrument may operate or be applied differently the person administering the measurement may change

Internal-Consistency Reliability – the degree to which an individual's scores are unchanged within a day We use the multiple trials within a day method to obtain the internal-consistency reliability coefficient

To obtain an internal-consistency reliability coefficient, the evaluator must give at least 2 trials of the test within a single day Change in the scores of the people being tested from trial to trial indicates a lack of test reliability

The correlation among the trial scores is the internal-consistency reliability coefficient What types of tests should not be evaluated for reliability using this method?

Stability versus Internal Consistency The internal-consistency reliability coefficient is usually higher than the stability reliability coefficient With the test-retest method some learning or increase in performance will usually occur, despite the fact that it is presumed that ability will not change

After completing a test for the first time, subjects will often perform better on the second administration This improvement can be referred to as a learning effect The learning effect is a threat to reliability How do we avoid this problem?

Methods of Calculating a Reliability Coefficient Pearson’s r Intraclass R from One-Way ANOVA Cronbach’s Alpha

SPSS Interclass Reliability Output

SPSS Output for Intraclass Reliability Analysis (Cronbach’s Alpha) Here is the SPPS output with the value for Cronbach’s alpha:

SPSS Output for Intraclass R from One-Way ANOVA Table This ANOVA table may be used to calculate the intraclass R R = (31.500 – 0.333) / 31.500 = 0.989

Acceptable Reliability R = 0.70-0.79 is below-average R = 0.80-0.89 is average R = 0.90-1.00 is above-average

Intertester Reliability (Objectivity) Objectivity (rater reliability) – the degree to which multiple scorers agree on the magnitude of scores How can the objectivity of a test be improved?

Next Class • Mock proposals

Measuring Research Variables