1.16k likes | 1.17k Views
Learn the importance of reliable and valid measurement in chiropractic practice and research. Explore measurement error, true score theory, systematic errors, and estimation methods. Find out how inter-examiner and intra-examiner reliability impact study results.
E N D
Accurate and consistent measures are needed • It is very important in research and clinical practice to be able to measure patient characteristics accurately and consistently • Needed in clinical trials to effectively assess differences between groups • Needed in practice to help make clinical decisions and to track patients’ progress Evidence-based Chiropractic
Reliability • The ability of a test to provide consistent results when repeated • By the same examiner • Or by more than one examiner testing the same attribute on the same group of subjects • Specific research designs are utilized to determine the degree tests are reliable Evidence-based Chiropractic
Validity • The degree to which a test truly measures what it was intended it to measure • In valid tests, when the characteristic being measured changes, corresponding changes occur in the test measurement • In contrast, tests with reduced validity do not reflect patient changes very well Evidence-based Chiropractic
Measurement error • All measurements have some degree of error • Thus, any given test score will consist of a true score plus an error component Observed score = True score + Error • True score is a theoretical concept involving a measurement derived from a perfect instrument in an ideal environment Evidence-based Chiropractic
True score theory • In a group of subjects, variation of true scores occurs because of • Individual differences of the subjects • Plus an error component • Consequently, group scores will always be variable and the variability will result in a distribution of true scores plus error that conforms to a normal curve when the sample size is large enough Evidence-based Chiropractic
Random errors • Errors that are attributable to the examiner, the subject, or the measuring instrument • Have little effect on the group’s mean score because the errors are just as likely to be high as they are low • For example, blood pressure which is variable depending on a number of factors Evidence-based Chiropractic
Systematic errors • Errors that cause scores to move in only one direction in response to a factor that has a constant effect on the measurement system • Considered to be a form of bias • For example, a sphygmomanometer that is out of calibration and always generates high BP readings Evidence-based Chiropractic
Error components Evidence-based Chiropractic
Estimating reliability • The proportion of true score variance divided by the observed score variance • True score variance • Real differences between subjects’ scores due to biologically different people • Observed score variance • The portion of variability that is due to faults in measurement Evidence-based Chiropractic
Observed score variance Evidence-based Chiropractic
The reliability coefficient • Becomes larger (increased reliability) as error variance gets smaller • Equals 1.0 when error variance is 0.0 • Becomes smaller (decreased reliability) as error variance gets larger Evidence-based Chiropractic
Interpretation of thereliability coefficient • A reliability coefficient of 0.75 means that 75% of the variance in the scores is due to the true variance of the trait being measured and 25% is due to the error variance Evidence-based Chiropractic
Interpretation of thereliability coefficient (cont.) • Ranges from 0.0 to 1.0 • 0.0 represents no reliability and 1.0 perfect reliability • Implications • 0.75 or greater good reliability • 0.5 to 0.75 moderate reliability • <0.5 indicates poor reliability. Evidence-based Chiropractic
Inter-examiner reliability • When 2 or more examiners test the same subjects for the same characteristic using the same measure, scores should match • Inter-examiner reliability is the degree that their findings agree Evidence-based Chiropractic
Intra-examiner reliability • Scores should also match when the same examiner tests the same subjects on two or more occasions • Intra-examiner reliability is the degree that the examiner agrees with himself or herself Evidence-based Chiropractic
Quantifying inter-examiner and intra-examiner reliability • Correlation • There should be a high degree of correlation between scores of 2 examiners testing the same group of subjects or 1 examiner testing the same group on 2 occasions • However, it is possible to have good correlation and concurrent poor agreement • Occurs when 1 examiner consistently scores subjects higher or lower than the other examiner Evidence-based Chiropractic
Graphing reliability 50 40 30 20 10 ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ Examiner 2 scores ▼ ▼ ▼ ▼ ▼ ▼ ▼ Very good correlation ▼ ▼ ▼ ▼ 10 20 30 40 50 Examiner 1 scores Evidence-based Chiropractic
Good correlation and concurrent poor agreement 50 40 30 20 10 Good correlation, but no agreement ▼ Examiner 1 = 40 Examiner 2 = 50 ▼ Examiner 1 = 30 Examiner 2 = 40 ▼ Examiner 2 scores Examiner 1 = 20 Examiner 2 = 30 ▼ Examiner 1 = 10 Examiner 2 = 20 10 20 30 40 50 Examiner 1 scores Evidence-based Chiropractic
Test-retest reliability • A test is administered to the same group of subjects on more than one occasion • Test scores should be consistent when repeated • Test scores should correlate well • Test-retest reliability is used to assess self-administered questionnaires which are not directly controlled by the examiner Evidence-based Chiropractic
Test-retest reliability (cont.) • It is assumed that the condition being considered has not changed between tests • Conditions that noticeably change over time are not good candidates for test-retest reliability studies • e.g., pain and disability status Evidence-based Chiropractic
Test-retest reliability (cont.) Questionnaire (Time 1) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Questionnaire (Time 2) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh ? = Evidence-based Chiropractic
Parallel forms reliabilitya.k.a. Alternate forms reliability • Two versions of a questionnaire or test that measures the same construct are compared • Both versions are administered to the same subjects • Scores are compared to determine the level of correlation Evidence-based Chiropractic
Parallel forms reliability (cont.) Questionnaire (Version 1) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Questionnaire (Version 2) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh ? = Evidence-based Chiropractic
Internal consistency reliability • The degree each of the items in a questionnaire measures the targeted construct • All questions should measure various characteristics of the construct and nothing else Evidence-based Chiropractic
Internal consistency reliability (cont.) • A questionnaire is administered to 1 group of subjects on 1 occasion • The results are examined to see how well questions correlate • If reliable, each question contributes in a similar way to the questionnaire’s overall score Evidence-based Chiropractic
Internal consistency reliability(cont.) Does - Q1 correlate well with Q8 Q1 with Q9 Q2 with Q7 Questionnaire 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Total score____ Also Do - Q1, Q7, Q9, etc. correlate well with the total score ? Evidence-based Chiropractic
Cronbach’s coefficient alpha • A measure of internal consistency that evaluates items in a questionnaire to determine the degree that they measure the same construct • Is essentially the mean correlation between each of a set of items Evidence-based Chiropractic
Cronbach’s alpha (cont.) • Values range from 1, representing perfect internal consistency, to less than zero when a questionnaire includes many negatively correlating items • Alpha values ≥0.70 are generally considered to be acceptable Evidence-based Chiropractic
2 X 2 contingency table to compare results of examiners • Useful to visualize the results of two examiners who are evaluating the same group of patients • Inter-examiner reliability articles often present their findings in the form of a 2 X 2 contingency table • If not, they are fairly easy to create from the data presented in the article Evidence-based Chiropractic
2 X 2 contingency table (cont.) Agreements - a & d Rater 1 Disagreements - b & c Evidence-based Chiropractic
The kappa statistic (κ) • Agreement between examiners evaluating the same patients can be represented by the percentage of agreement of paired ratings • However, percentage of agreement does not account for agreement that would be expected to occur by chance Evidence-based Chiropractic
The kappa statistic (cont.) • Even using unreliable measures, a few agreements are expected to occur just by chance • Only agreement that occurs beyond chance levels represents true agreement • This is what is represented by the kappa statistic • It is appropriate for use with dichotomous or nominal data Evidence-based Chiropractic
The kappa statistic (cont.) • Where observed agreement (PO) is the total proportion of observations where there is agreement Evidence-based Chiropractic
The kappa statistic (cont.) • Chance agreement (PC) is the proportion of agreements that would be expected by chance • aexpected and dexpected can be found using the same procedure used to calculate expected cell values in the chi square test • (Multiply the row total by the column total for cells a and d and then dividing by the grand total) Evidence-based Chiropractic
The kappa statistic (cont.) • The values of PO and PC are then utilized in the following formula to calculate the kappa statistic • When the amount of observed agreement exceeds chance agreement, kappa will be positive • The strength of agreement is determined by the magnitude of kappa • If negative, agreements are less than chance Evidence-based Chiropractic
Interpretation of kappa values Evidence-based Chiropractic
Kappa example • Reliability of McKenzie classification of patients with cervical or lumbar pain • 50 spinal pain patients (25 lumbar and 25 cervical) were simultaneously assessed by 2 physical therapists (14 in total) to classify patients into syndromes and subsyndromes κ = 0.84 for syndrome classification κ = 0.87 for subsyndrome classification Evidence-based Chiropractic
Intraclass Correlation Coefficient (ICC) • Another measure of inter-examiner reliability that is for use with continuous variables • Can be used to evaluate 2 or more raters • Pearson’s r can be used • But ICC is preferred when sample size is small (<15) or more than two tests are involved Evidence-based Chiropractic
ICC (Cont.) • There are three models of ICC that may utilize one of two different forms • Thus, 6 possible types of ICC depending on how raters are chosen and how subjects are assigned • The type of ICC used should always be presented in research papers • The first number represents the ICC model • The second represents the form used Evidence-based Chiropractic
ICC (Cont.) • For example • Clare et al reported on the reliability of detection of lumbar lateral shift and found it to be moderate • ICC [2,1] values ranging from 0.48 to 0.64 Model Form Evidence-based Chiropractic
ICC is an index of reliability • Can range from below 0.0 to +1.0 • With ≈0.0 indicating weak reliability ≈1.0 strong reliability • Suggestedinterpretation • Some clinical measures require ≥0.90 Evidence-based Chiropractic
ICC is based on variance • ICC is the ratio of between-groups variance to total variance, where • Between-groups variance is due to different subjects having test scores that truly differ • Total variance is due to score differences resulting from inter-rater unreliability of two or more examiners rating the same person • Two-way ANOVA is used to calculate ICC Evidence-based Chiropractic
Validity • The ability of tests and measurements to in fact evaluate the traits that they were intended to evaluate • Vital in research, as well as in clinical practice • The extent of a test’s validity depends on the degree to which systematic error has been controlled for Evidence-based Chiropractic
Validity (cont.) • The greater the validity, the more likely test results will reflect true differences between scores and not systematic error • It’s a matter of degrees, not black-and-white • Technically incorrect to say a test is “valid” or “invalid” • Better to use categories like highly valid, moderately valid, etc. Evidence-based Chiropractic
Validity (cont.) • Test validity depends on its intended purpose • For example, a hand-grip dynamometer is valid to measure grip strength, but it is not valid to measure the qualities of hand tremor Evidence-based Chiropractic
Validity (cont.) • An invalid test can still be reliable • For example, a test that used skull circumference to predict intelligence • Reliability would probably be excellent, but it would not be a valid predictor of intelligence • But an unreliable test can never be considered valid Evidence-based Chiropractic
Methods to estimate the extent of test validity • Can be divided into 3 major categories • Self-evident • Does the test appear to measure what it is supposed to measure • Pragmatic • Does the test actually work as hypothesized • Construct validity • Does the test adequately measure the theoretical construct involved Evidence-based Chiropractic
Self-evident methods • Face validity • Simply deciding whether a test appears to have merit based on “face value” • e.g., if a headache questionnaire asked about the location of head pain it would have face validity • If it asked about hair color, it probably would not • The lowest level of test validation • Often assessed when researchers are first exploring a topic Evidence-based Chiropractic
Self-evident methods (cont.) • Content validity • The ability of a test to include or represent all of the content of a construct • Another definition for content validity • The content of a test is compared to the literature that is already available on the topic • The test is said to have good content validity if it accurately reflects what is in the literature Evidence-based Chiropractic