500 likes | 653 Views
Data Collection. Methods, tools, and issues. EDWARD JAMES R GORGON MPhysio BCHPEd PTRP Department of Physical Therapy College of Allied Medical Professions University of the Philippines Manila Email: edward.gorgon@hotmail.com. Learning objectives. Define reliability
E N D
Data Collection Methods, tools, and issues EDWARD JAMES R GORGONMPhysio BCHPEd PTRP Department of Physical Therapy College of Allied Medical Professions University of the Philippines Manila Email: edward.gorgon@hotmail.com
Learning objectives • Define reliability • Discuss potential sources of measurement error • Explain the types of reliability • Explain concepts in measurement reliability • Define validity • Explain the types of validity • Explain the concepts of sensitivity and specificity
Measurementreliability • Degree of consistency or agreement between repeated measurement taken when the underlying phenomenon has not changed • Reproducibility and repeatability of an instrument or procedure in measurement Error = Variation without true change Repeatability = Reproducibility
Measurementreliability Potential sources of measurement error • Rater • Patient / subject • Equipment • Procedure
Measurementreliability Error related to the RATER • Competence / skill • Preparation • Motivation / interest • Fatigue
Measurementreliability Error related to the PATIENT / SUBJECT • Comprehension • Familiarization • Environment • Pain • Fatigue
Measurementreliability Error related to the PATIENT / SUBJECT • Recovery / deterioration • Hawthorne effect
Measurementreliability Error related to the EQUIPMENT • Operation • Maintenance • Calibration • Sensitivity
Measurementreliability Error related to the PROCEDURE • Positioning • Handling • Stabilization • Instructions
Measurementreliability Types of reliability • Internal consistency • Test-retest • Intra-rater • Inter-rater
Internal consistency • Degree of homogeneity of test items within an instrument to the attribute being measured • Measured at one point in time • Usually assessed using Cronbach’s alpha (α)
Test-retest reliability • Degree to which an instrument is stable, based on repeated (at least 2) measurements on different occasions • Constant test conditions, including subjects and rater(s), in both occasions • Not possible to assess if the variable is labile
Test-retest reliability • Barthel Index, BADL(Sackley et al, 2006) WEEK 1 WEEK 2 SUBJ1 10 11 SUBJ2 10 10 SUBJ3 11 12 SUBJ4 13 13 SUBJ5 9 11 SUBJ6 11 12 SUBJ7 12 11 SUBJ8 10 9
Intra-rater reliability • Stability of data recorded by 1 rater across 2 or more trials done in 1 occasion of measurement • Constant test conditions, including subjects, in both trials
Intra-rater reliability • Goniometry, knee flexion(Lin, 2003) TRIAL 1 (deg) TRIAL 2 (deg) SUBJ1 76 75 SUBJ2 90 87 SUBJ3 84 82 SUBJ4 83 85 SUBJ5 79 78 SUBJ6 87 86 SUBJ7 80 82 SUBJ8 77 79
Inter-rater reliability • Variation between 2 or more raters who measure the same group of subjects at least once each • Constant test conditions, including subjects • Potential bias from differences in raters’ training and experience levels
Inter-rater reliability • Peabody, language skills(van Kleeck et al., 2006) RATER 1 RATER 2 SUBJ1 45 69 SUBJ2 99 81 SUBJ3 84 75 SUBJ4 80 74 SUBJ5 79 72 SUBJ6 81 85 SUBJ7 60 82 SUBJ8 76 87
Reliability coefficient • Formula: true score variance ----------------------------------------------------- true score variance + error variance
Kappa (k) • Represents the average rate of agreement for an entire set of yes/no responses • Appropriate when data are nominal-level or ordinal-level • Varies from 0 – 1 (no units associated)
Coefficient of variation (CoV) • Formula: Standard deviation ------------------------------- X 100% Mean
Coefficient of variation (CoV) • The standard deviation expressed as a percentage of the mean • Useful when comparing variability in different groups • Appropriate when data are interval-level or ratio-level
Intraclass correlation coefficient (ICC) • Ratio of person variance divided by total variance (between persons + within persons) • Reflects both the degree of correspondence and agreement among ratings • Varies from 0 – 1 (no units associated)
Interpreting reliability estimates • “Rule of thumb” > 0.80 = Excellent 0.60 – 0.79 = Adequate < 0.60 = Poor • HOWEVER, estimates are population-specific and use may be context-specific
Choosing reliable outcome measures • Rigor of standardization studies for reliability Excellent More than 2 well-designed reliability studies completed with adequate to excellent reliability values Adequate 1-2 well-designed reliability studies with adequate to excellent reliability values Poor Reliability studies poorly completed, or reliability studies showing poor levels of reliability No evidence available
Measurement validity • Extent to which an instrument measures what it is supposed to measure = TRUENESS OF A MEASURE • Validity implies that a measurement is relatively free from error, i.e., a valid test is also reliable • Validity allows generalizations beyond a specific score
Measurement validity • Emphasis is placed on the objectives of a test and the ability to make inferences from test scores or measurements • Specificity of validity evaluated within the context of the test’s intended use and a specific population
Measurement validity How to say thatinferences from a test are valid? • Instrument output related and proportional to the actual variable of interest • Values assigned to the variable are representative of response
Types of validity • Face validity • Content validity • Criterion-related validity • Construct validity
Face validity • The extent to which an instrument appears to test what it is supposed to test • Determined by a non-rigorous process – ALL OR NONE • Insufficient for the overall validity of a test
Content validity • The extent to which items in an instrument addresses and samples relevant aspects within the concept / variable being measured / assessed
Content validity • Important characteristic of questionnaires, examinations, and interviews • Demands that a test is not influenced by factors irrelevant to the purpose of measurement
Criterion validity • The extent to which an instrument agrees with an external criterion measurement (a “gold standard”) of that concept • Ergo, outcomes of the instrument can be used as a substitute measure for the gold standard • If the correlation between the target test and criterion is high, the test is a valid predictor of the criterion score
Criterion validity • Criterion must be reliable and relevant to the parameter measured by the target test • Criterion and target ratings should be independent and free from bias • If a gold standard does not exist, other similar measures are used
Criterion validity • CONCURRENT validity Target measurement and criterion measurement taken at the same time • PREDICTIVE validity Test will be a valid predictor of a future criterion score
Construct validity • Ability of an instrument to measure an abstract (typically multidimensional) construct and the degree to which the instrument reflects the theoretical components of that construct
Construct validity • CONVERGENT validity The extent to which an instrument agrees with conceptually similar instruments • DIVERGENT validity The extent to which an instrument lacks correlation with instruments that, conceptually, are distinct
Validity estimates: Pearson’s r • Demonstrates the strength of linear relationship between 2 variables • Often used, if erroneously, as a reliability indicator • Varies from –1 through 0 through +1 (directionality of relationship indicated by the - / + sign)
Sensitivity and specificity Sensitivity • The ability of a test to obtain a positive test when the condition is actually present Specificity • The ability of a test to obtain a negative test when the condition is actually absent
Sensitivity Sensitivity = [a / (a + c)] x 100%
Specificity Specificity = [d / (b + d)] x 100%
Measure development • Planning • Test construction • Reliability testing • Validation
Measure development • Appropriateness of the test for the target group • Interpretation of results in a meaningful way • Sufficient sensitivity to detect small but CLINICALLY RELEVANT change • Application of the test in varied settings and populations to determine useful properties
Selection criteria for measures • Appropriateness to the target group • Psychometric properties Validity Reliability Sensitivity to clinically relevant change Sensitivity and specificity , if diagnostic purpose
Selection criteria for measures • Clinical utility / practicality of administration Clarity of instructions Format (interview, questionnaire, task performance, naturalistic observation, other) Ease of administration (time required to complete, scoring, interpretation) Expertise / training required for administering and/or interpreting Cost-effectiveness