Reliability and Validity Designs

Reliability and Validity Designs

Accurate and consistent measures are needed • It is very important in research and clinical practice to be able to measure patient characteristics accurately and consistently • Needed in clinical trials to effectively assess differences between groups • Needed in practice to help make clinical decisions and to track patients’ progress Evidence-based Chiropractic

Reliability • The ability of a test to provide consistent results when repeated • By the same examiner • Or by more than one examiner testing the same attribute on the same group of subjects • Specific research designs are utilized to determine the degree tests are reliable Evidence-based Chiropractic

Validity • The degree to which a test truly measures what it was intended it to measure • In valid tests, when the characteristic being measured changes, corresponding changes occur in the test measurement • In contrast, tests with reduced validity do not reflect patient changes very well Evidence-based Chiropractic

Measurement error • All measurements have some degree of error • Thus, any given test score will consist of a true score plus an error component Observed score = True score + Error • True score is a theoretical concept involving a measurement derived from a perfect instrument in an ideal environment Evidence-based Chiropractic

True score theory • In a group of subjects, variation of true scores occurs because of • Individual differences of the subjects • Plus an error component • Consequently, group scores will always be variable and the variability will result in a distribution of true scores plus error that conforms to a normal curve when the sample size is large enough Evidence-based Chiropractic

Random errors • Errors that are attributable to the examiner, the subject, or the measuring instrument • Have little effect on the group’s mean score because the errors are just as likely to be high as they are low • For example, blood pressure which is variable depending on a number of factors Evidence-based Chiropractic

Systematic errors • Errors that cause scores to move in only one direction in response to a factor that has a constant effect on the measurement system • Considered to be a form of bias • For example, a sphygmomanometer that is out of calibration and always generates high BP readings Evidence-based Chiropractic

Error components Evidence-based Chiropractic

Estimating reliability • The proportion of true score variance divided by the observed score variance • True score variance • Real differences between subjects’ scores due to biologically different people • Observed score variance • The portion of variability that is due to faults in measurement Evidence-based Chiropractic

Observed score variance Evidence-based Chiropractic

The reliability coefficient • Becomes larger (increased reliability) as error variance gets smaller • Equals 1.0 when error variance is 0.0 • Becomes smaller (decreased reliability) as error variance gets larger Evidence-based Chiropractic

Interpretation of thereliability coefficient • A reliability coefficient of 0.75 means that 75% of the variance in the scores is due to the true variance of the trait being measured and 25% is due to the error variance Evidence-based Chiropractic

Interpretation of thereliability coefficient (cont.) • Ranges from 0.0 to 1.0 • 0.0 represents no reliability and 1.0 perfect reliability • Implications • 0.75 or greater good reliability • 0.5 to 0.75 moderate reliability • <0.5 indicates poor reliability. Evidence-based Chiropractic

Inter-examiner reliability • When 2 or more examiners test the same subjects for the same characteristic using the same measure, scores should match • Inter-examiner reliability is the degree that their findings agree Evidence-based Chiropractic

Intra-examiner reliability • Scores should also match when the same examiner tests the same subjects on two or more occasions • Intra-examiner reliability is the degree that the examiner agrees with himself or herself Evidence-based Chiropractic

Quantifying inter-examiner and intra-examiner reliability • Correlation • There should be a high degree of correlation between scores of 2 examiners testing the same group of subjects or 1 examiner testing the same group on 2 occasions • However, it is possible to have good correlation and concurrent poor agreement • Occurs when 1 examiner consistently scores subjects higher or lower than the other examiner Evidence-based Chiropractic

Graphing reliability 50 40 30 20 10 ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ Examiner 2 scores ▼ ▼ ▼ ▼ ▼ ▼ ▼ Very good correlation ▼ ▼ ▼ ▼ 10 20 30 40 50 Examiner 1 scores Evidence-based Chiropractic

Good correlation and concurrent poor agreement 50 40 30 20 10 Good correlation, but no agreement ▼ Examiner 1 = 40 Examiner 2 = 50 ▼ Examiner 1 = 30 Examiner 2 = 40 ▼ Examiner 2 scores Examiner 1 = 20 Examiner 2 = 30 ▼ Examiner 1 = 10 Examiner 2 = 20 10 20 30 40 50 Examiner 1 scores Evidence-based Chiropractic

Test-retest reliability • A test is administered to the same group of subjects on more than one occasion • Test scores should be consistent when repeated • Test scores should correlate well • Test-retest reliability is used to assess self-administered questionnaires which are not directly controlled by the examiner Evidence-based Chiropractic

Test-retest reliability (cont.) • It is assumed that the condition being considered has not changed between tests • Conditions that noticeably change over time are not good candidates for test-retest reliability studies • e.g., pain and disability status Evidence-based Chiropractic

Test-retest reliability (cont.) Questionnaire (Time 1) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Questionnaire (Time 2) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh ? = Evidence-based Chiropractic

Parallel forms reliabilitya.k.a. Alternate forms reliability • Two versions of a questionnaire or test that measures the same construct are compared • Both versions are administered to the same subjects • Scores are compared to determine the level of correlation Evidence-based Chiropractic

Parallel forms reliability (cont.) Questionnaire (Version 1) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Questionnaire (Version 2) 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh ? = Evidence-based Chiropractic

Internal consistency reliability • The degree each of the items in a questionnaire measures the targeted construct • All questions should measure various characteristics of the construct and nothing else Evidence-based Chiropractic

Internal consistency reliability (cont.) • A questionnaire is administered to 1 group of subjects on 1 occasion • The results are examined to see how well questions correlate • If reliable, each question contributes in a similar way to the questionnaire’s overall score Evidence-based Chiropractic

Internal consistency reliability(cont.) Does - Q1 correlate well with Q8 Q1 with Q9 Q2 with Q7 Questionnaire 1 hh hh 2 hh hh 3 hh hh 4 hh hh 5 hh hh 6 hh hh 7 hh hh 8 hh hh 9 hh hh 10 hh hh Total score____ Also Do - Q1, Q7, Q9, etc. correlate well with the total score ? Evidence-based Chiropractic

Cronbach’s coefficient alpha • A measure of internal consistency that evaluates items in a questionnaire to determine the degree that they measure the same construct • Is essentially the mean correlation between each of a set of items Evidence-based Chiropractic

Cronbach’s alpha (cont.) • Values range from 1, representing perfect internal consistency, to less than zero when a questionnaire includes many negatively correlating items • Alpha values ≥0.70 are generally considered to be acceptable Evidence-based Chiropractic

2 X 2 contingency table to compare results of examiners • Useful to visualize the results of two examiners who are evaluating the same group of patients • Inter-examiner reliability articles often present their findings in the form of a 2 X 2 contingency table • If not, they are fairly easy to create from the data presented in the article Evidence-based Chiropractic

2 X 2 contingency table (cont.) Agreements - a & d Rater 1 Disagreements - b & c Evidence-based Chiropractic

The kappa statistic (κ) • Agreement between examiners evaluating the same patients can be represented by the percentage of agreement of paired ratings • However, percentage of agreement does not account for agreement that would be expected to occur by chance Evidence-based Chiropractic

The kappa statistic (cont.) • Even using unreliable measures, a few agreements are expected to occur just by chance • Only agreement that occurs beyond chance levels represents true agreement • This is what is represented by the kappa statistic • It is appropriate for use with dichotomous or nominal data Evidence-based Chiropractic

The kappa statistic (cont.) • Where observed agreement (PO) is the total proportion of observations where there is agreement Evidence-based Chiropractic

The kappa statistic (cont.) • Chance agreement (PC) is the proportion of agreements that would be expected by chance • aexpected and dexpected can be found using the same procedure used to calculate expected cell values in the chi square test • (Multiply the row total by the column total for cells a and d and then dividing by the grand total) Evidence-based Chiropractic

The kappa statistic (cont.) • The values of PO and PC are then utilized in the following formula to calculate the kappa statistic • When the amount of observed agreement exceeds chance agreement, kappa will be positive • The strength of agreement is determined by the magnitude of kappa • If negative, agreements are less than chance Evidence-based Chiropractic

Interpretation of kappa values Evidence-based Chiropractic

Kappa example • Reliability of McKenzie classification of patients with cervical or lumbar pain • 50 spinal pain patients (25 lumbar and 25 cervical) were simultaneously assessed by 2 physical therapists (14 in total) to classify patients into syndromes and subsyndromes κ = 0.84 for syndrome classification κ = 0.87 for subsyndrome classification Evidence-based Chiropractic

Intraclass Correlation Coefficient (ICC) • Another measure of inter-examiner reliability that is for use with continuous variables • Can be used to evaluate 2 or more raters • Pearson’s r can be used • But ICC is preferred when sample size is small (<15) or more than two tests are involved Evidence-based Chiropractic

ICC (Cont.) • There are three models of ICC that may utilize one of two different forms • Thus, 6 possible types of ICC depending on how raters are chosen and how subjects are assigned • The type of ICC used should always be presented in research papers • The first number represents the ICC model • The second represents the form used Evidence-based Chiropractic

ICC (Cont.) • For example • Clare et al reported on the reliability of detection of lumbar lateral shift and found it to be moderate • ICC [2,1] values ranging from 0.48 to 0.64 Model Form Evidence-based Chiropractic

ICC is an index of reliability • Can range from below 0.0 to +1.0 • With ≈0.0 indicating weak reliability ≈1.0 strong reliability • Suggestedinterpretation • Some clinical measures require ≥0.90 Evidence-based Chiropractic

ICC is based on variance • ICC is the ratio of between-groups variance to total variance, where • Between-groups variance is due to different subjects having test scores that truly differ • Total variance is due to score differences resulting from inter-rater unreliability of two or more examiners rating the same person • Two-way ANOVA is used to calculate ICC Evidence-based Chiropractic

Validity • The ability of tests and measurements to in fact evaluate the traits that they were intended to evaluate • Vital in research, as well as in clinical practice • The extent of a test’s validity depends on the degree to which systematic error has been controlled for Evidence-based Chiropractic

Validity (cont.) • The greater the validity, the more likely test results will reflect true differences between scores and not systematic error • It’s a matter of degrees, not black-and-white • Technically incorrect to say a test is “valid” or “invalid” • Better to use categories like highly valid, moderately valid, etc. Evidence-based Chiropractic

Validity (cont.) • Test validity depends on its intended purpose • For example, a hand-grip dynamometer is valid to measure grip strength, but it is not valid to measure the qualities of hand tremor Evidence-based Chiropractic

Validity (cont.) • An invalid test can still be reliable • For example, a test that used skull circumference to predict intelligence • Reliability would probably be excellent, but it would not be a valid predictor of intelligence • But an unreliable test can never be considered valid Evidence-based Chiropractic

Methods to estimate the extent of test validity • Can be divided into 3 major categories • Self-evident • Does the test appear to measure what it is supposed to measure • Pragmatic • Does the test actually work as hypothesized • Construct validity • Does the test adequately measure the theoretical construct involved Evidence-based Chiropractic

Self-evident methods • Face validity • Simply deciding whether a test appears to have merit based on “face value” • e.g., if a headache questionnaire asked about the location of head pain it would have face validity • If it asked about hair color, it probably would not • The lowest level of test validation • Often assessed when researchers are first exploring a topic Evidence-based Chiropractic

Self-evident methods (cont.) • Content validity • The ability of a test to include or represent all of the content of a construct • Another definition for content validity • The content of a test is compared to the literature that is already available on the topic • The test is said to have good content validity if it accurately reflects what is in the literature Evidence-based Chiropractic

Reliability and Validity Designs

Reliability and Validity Designs

Presentation Transcript

Reliability and Validity

Reliability and Validity

Reliability and Validity

Reliability and Validity

VALIDITY AND RELIABILITY

Reliability and Validity

Validity and Reliability

Validity and Reliability

Reliability and Validity

Validity and reliability

Validity and Reliability

Validity and Reliability

Reliability and Validity

Validity and Reliability

Reliability and Validity

Validity and Reliability

Reliability and Validity

Reliability and Validity Designs

Reliability and Validity

Validity and Reliability