190 likes | 318 Views
Measures of Reliability in Sports Medicine and Science. Will G. Hopkins Sports Medicine 30 (4): 1-25, 2000. Measurement Error & Reliability. Measurement error makes observed value differ from the true value.
E N D
Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine30(4): 1-25, 2000
Measurement Error & Reliability • Measurement error makes observed value differ from the true value. • Reliability refers to reproducibility of values in repeated trials on the same subjects. • The purpose is to quantify random error or ‘noise’. • The smaller the random error, the better the measure.
Measures of Reliability • Within-Subject Variation • Affects the precision of estimates of change in the variable of an experiment. • The smaller the within-subject variation, the easier it is to measure change in performance. • Change in the Mean has two components: • Random change due to sampling error • Systematic change (learning or training effect) • Retest Correlation-- represents how closely one trail matches another trial.
Within-Subject Variation • Within-subject variation is the random variation in an individual over trials. • Given 6 trials of subject 1: 71, 76, 74, 79, 79, 76 • The sd of the the within-subject variation is called the standard error of measurement (SEM). • The SEM represents the ‘typical error’.
‘Typical Error’ • To estimate ‘typical error’ use many subjects and a few trials.
Computing Typical Error • Compute difference scores • Compute SD of difference scores • Divide SD of difference by Typical Error = 4.1 / Typical Error = 2.9
Typical Error as a Percentage • For many measures the typical error gets bigger as the value gets bigger. • Athlete 1 has a mean & typical error of: 378.6 4.4 • Athlete 2 has a mean & typical error of: 453.1 6.1 • When the typical error is expressed as a percent of their respective means the values are similar: 1.2 and 1.3% • This form of typical error is a Coefficient of Variation. Since it is a dimensionless measure it allows direct comparison of reliability.
Change in the Mean • The change in the mean as a measure of reliability is has two components: • Random change due to sampling error. • Systematic change due to: learning effects, fatigue, lack of motivation or training effects. • Be sure to give the subjects sufficient training to acclimate to the experiment before beginning, to avoid learning effects.
Retest Correlation • The retest correlation (r) is not as good of measure of reliability as ‘typical error’. • The retest r is sensitive to heterogeneity (spread) of values between participants. • The ‘typical error’ can be estimated from a sample that isn’t even particularly representative. • You cannot compare the reliability of two measures based upon their retest r alone, the retest r can change with a different sample, if the hetergeneity is different.
Threshold for a ‘Real Change’ • 1.5 to 2.0 times ‘typical error’ represents a real change. • Ex: if ‘typical error’ for the sum of 7 skinfolds is 1.6 mm an observed change of at least 2 to 3 mm would indicate a real change. • The value of ‘typical error’ must come from a short time period (1-2 days for skinfold), in which there is no change in the subjects between trials.
Estimation of Sample Size • To use ‘typical error’, the sample duration must be the same as intended study. • The ‘typical error’ of the dependent variable represents the noise that obsures the change in the mean from pre to post. • Using ‘typical error’ the sample sizes will tend to be unrealistical large. • Sample size should be chosen to give adequate precision for an outcome. • Precision is defined by 95% confidence intervals. • The range in which the true value is 95% likely to occur.
Estimation of Sample Size • In a (pre - post) design, statistical theory predicts a confidence limits: t0.975, n-1 s 2 / n • n is sample size • s is ‘typical error’ • t is t statistic • Equating this to the confidence limits representing adequate precision ( d) n = 2(t s / d)2 = 8 s2 / d2
Sample Size and Reliability • Sample size is proportional the ‘typical error’ squared. • Reduce ‘typical error’ and you need fewer subjects. • When the ‘typical error’ equals the smallest worthwile effect (s = d) you only need 10 subjects. • A test with twice the typical error would require 4 times the subjects.
Estimation of Individual Differences • Individual differences occur when the response to a treatment differs between subjects. • To estimate individual differences (Sdiff) Sdiff = (2s2expt - 2s2) where sexpt is inflated typical error of experimental group and s is the typical error in control group (or from a reliability study).
Acceptable Likely Range for Typical Error 15 sub, 4 trials, typical error 1% True typical error = 1% * 1.24 to 1% 1.24 = 1.24 to 0.81 50 sub, 3 trials reduces the factors to 1.32 - 0.76
Analysis of Simple Studies • Analysis of reliability with 2 trials is straight forward: compute typical error from difference scores, and the change in the mean is simply the mean difference. • For 3 or more trials, check for learning effects by comparing consecutive pairs (trials 1&2, trials 2&3…). • Download the spreadsheet from SportSci.Org
Excel Reliability (sportsci.org) Typical error = 1.2 / 2 Typical error = .83
Intraclass Correlation ICC(3,1) • For a retest correlation measure of reliability, the ICC (3,1) [Shrout & Fleiss] is unbiased for any sample size. • Use of ICC is appropriate with more than 2 trials. • To caluclate ‘typical error’ from ICC: s = S (1 - r), where s is typical error and S is the ave sd for subjects in each trial and r is the ICC.
Reliability Between Different Equipment, Methods, Installations • Use ICC (2,1) when retesting subjects on different equipment, methods or installations. • The ICC (2,1) is derived from the fully-random model, where subjects and trials are considered as random effects. • Researchers have oftenmisapplied the ICC (2,1) to data from a single item of equipment.