Reliability

Reliability • Consistency • Test Scores & Error • X=T+E • As T goes up & E goes down, reliability increases • Variance & Error Variance

Sources of Error • Test Construction/Content • Sampling; finite number of questions • Poorly written questions • Test Administration • Error related to the test taker • Error related to the test environment • Error related to the examiner

Sources of Error (cont.) • Test scoring & interpretation • Objective v. subjective • scoring rubrics

Parallel Tests • Theoretical underpinning of reliability • Similar content • Same true score & same error variance • Theoretical, not produced in reality • Not to be confused with “alternate forms” • Reliability can be defined as the correlation between 2 parallel tests

Types of Reliability • Reliability over time • Internal consistency/reliability • Inter-rater reliability

Reliability over time • Test-retest reliability • Obtained by correlating pairs of scores from the same sample on two different administrations of the same test • Error related to passage of time & intervening factors • Alternate-Form (Immediate & Delayed) • Error related to time & content

Internal Consistency • Split-half • Divide the test into two equivalent halves • Odd-even • Randomly assign items • Divide by equivalency of items • Calculate r between 2 halves • Correct with Spearman-Brown • Allows estimation of reliability of test that has been shortened or lengthened

Internal Consistency (cont.) • Inter-item consistency • Index of homogeneity of test; degree to which all items measure same construct • Desirable: aids in interpretation of test (as opposed to homogeneity of groups)

Internal Consistency (cont.) • Kuder-Richardson formulas • KR-20: statistic of choice for determining reliability of tests with dichotomous items (right-wrong) • KR-21: can be used if assumption that all items are of similar difficulty

Internal Consistency (cont.) • Cronbach’s coefficient alpha • Function of all items on test & the total test score • Each item conceptualized as a test • 36-item test, 36-parallel tests • In addition to use with dichotomous tests can be used with tests containing nondichotomous items, e.g., opinion, tests which allow partial credit

Inter-rater reliability • How well do 2 raters/judges agree? • Correlation between scores from 2 raters • Percentage of agreement; percentage of intervals where both raters agreed behavior occurred • Kappa

Factors influencing reliability • Length of test • Longer tests increase percentage of domain that can be sampled • Point of diminishing returns • Homogeneity of items • Measure same construct; easier to interpret • Dynamic or static characteristics

Factors influencing reliability (cont.) • Homogeneity of sample • Restriction of range • If sample is homogenous then any observed variance must be error • Power v. Speed tests • Speed use test-retest; alternate forms; split half from 2 separately timed half tests • Internal consistency not applicable • Speed tests easy; internal consistency inflates reliability

Reliability of Individual Scores • How much error is in an individual score? • How much confidence do we have in a particular score? • Standard Error of Measurement • Extent to which one individual’s scores vary over tests that are presumed to be parallel • Assume error is distributed “normally” • Where is the individual’s “true” score?

Standard Error of Measurement

SEM (cont.) • Odds are 68% that “true” score falls within plus or minus 1 SEM. • Odds are __% that “true” score falls within plus or minus 2 (1.96) SEM. • Odds are __% that “true” score falls within plus or minus 3 SEM. • WHAT IS THE RELATIONSHIP BETWEEN RELIABILITY & SEM?

Standard Error of the Difference of Two Scores • Compare test takers performance on two different tests • Compare two test takers on the same test • Compare two test takers on two different tests

Standard Error of the Difference

Standard Error of the Difference • Set confidence intervals for difference scores • Difference scores contain error from both of the comparison measures. • Difference scores are less reliable than scores from individual tests.

Test-retest reliability:Social Interaction Self-Statement • r+1+2 = .99 • r-1-2 = .99 • r+1-1 = -.45 • r+1-2 = -.55 • r+2-1 = -.47 • r+2-2 = -.56

Reliability

Reliability

Presentation Transcript

Reliability

Reliability

Reliability

Reliability

Reliability

Reliability

RELIABILITY

RELIABILITY

Reliability

Reliability

Reliability

Reliability

RELIABILITY

Reliability

Reliability

Reliability

Reliability

Reliability

RELIABILITY

RELIABILITY

Reliability