320 likes | 611 Views
MEASUREMENT CHARACTERISTICS. Error & Confidence Reliability, Validity, & Usability. ERROR & CONFIDENCE . Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining Increasing confidence
E N D
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability
ERROR & CONFIDENCE • Reducing error • All assessment scores have error • Want to minimize so scores are accurate • Protocols & periodic staff training/retraining • Increasing confidence • Results lead to correct placement • Assessments that produce valid, reliable, and usable results
ASSESSMENT RESULTS • Norm-referenced • Individual’s score compared to others in their peer/norm group • School tests, 95% • Norm group needs to be representative of test takers the test was designed for
ASSESSMENT RESULTS • Criterion-referenced • Individual’s score compared to a preset standard or criterion • Standard doesn’t change based on the individual or group • A=250-295 points
VALIDITY • Describes how well the assessment results match their intended purpose • Are you measuring what you think you are measuring? • Relationship between program & assessment content • Does not have validity for all purposes, populations or time
VALIDITY • Depends on different types of evidence • Is a matter of degree (no tool is perfect) • Is a unitary concept • Change from past • Former types are now considered as evidence • Content validity/content-related evidence
FACE VALIDITY • Not listed in text • Do the items seem to fit?
CONTENT VALIDITY(Content-related evidence) • How well does assessment measure subject or content? • Representative • Completeness----all major areas • Nonstatistical • Review of literature or expert opinion • Blueprint of major components • Per Austin (1991), minimum requirement for any assessment
CRITERION-RELATED VALIDITY (Criterion-related evidence) • Comparison of results • Statistical • Reported as validity or correlation coefficient • +1 to -1 (1 is a perfect relationship) • 0 = no relationship • r.73 better than r.52 • r +/-.40 to +/-.70 = acceptable range
CRITERION-RELATED VALIDITY (Criterion-related evidence) • May use .30 to .40 if statistically significant • If validity is reported, it is generally criterion-related validity • 2 types • Predictive • Concurrent
PREDICTIVE VALIDITY • The ability of an assessment to predict future behaviors or outcomes • Measures are taken at different times • ACT or SAT & success in college • Leisure Satisfaction predicts discharge
CONCURRENT VALIDITY • More than one instrument measures the same content • Desire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable
CONSTRUCT VALIDITY(Construct-related evidence) • Theoretical/conceptual • Content & criterion-related validity contribute to construct validity • Research concerning conceptual framework on which assessment is based contribute to construct validity • Not demonstrated in a single project or statistical measure • Few TR have: focus = behavior not construct
CONSTRUCT VALIDITY(Construct-related evidence) • Factor analysis • Convergent validity (what it measures) • Divergent validity (what it doesn’t measure) • Expert panels here too
THREATS TO VALIDITY • Assessment s/b valid for intended use (e.g. research instruments) • Unclear directions • Unclear or ambiguous terms • Items that are at inappropriate level for subjects • Items not related to construct being measured
THREATS TO VALIDITY • Too few items • Too many items • Items with an identifiable pattern of response • Method of administration • Testing conditions • Subjects health, reluctance, attitudes • See Stumbo, 2002, p.41-42
VALIDITY • Can’t get valid results without reliable results, but can get reliable results without valid results • Reliability is a necessary but not sufficient condition for validity • See Stumbo, 2002, p. 54
RELIABLITY • Accuracy or consistency of a measurement • Reproducible results • Statistical in nature • r = between 0 & 1 (with 1 being perfect) • Should not be lower than .80 • Tells what portion of variance is non-error variance • Increases with length of test & spread of scores
STABILITY (Test-retest) • How stable is the assessment? • Assessment not overly influenced by passage of time • Same group assessed 2 times with same instrument & results of the 2 testings are correlated • Are the 2 sets of scores alike? • Time effects (longer, shorter)
EQUIVALENCY (Equivalent forms) • Also known as parallel-form or alternative-form reliability • How closely correlated are 2 or more forms of the same assessment? • 2 forms have been developed and demonstrated to measure the same construct • Forms have similar but not same items • e.g. NCTRC exam • Short & long forms are not equivalent
INTERNAL CONSISTENCY • How closely are items on the assessment related? • Split half • 1st half vs. 2nd half • Odd/even • Matched random subsets • If can’t divide • Cronbach’s alpha • Kuder-Richardson • Spearman-Brown’s formula
INTERRATER RELIABILITY • Percentage of agreements with number of observations • Difference between agreement & accuracy • Raters compared to each other • 80% agreement
INTERRATER RELIABILITY • Simple agreement • Number of agreements & disagreements • Point-to-point agreement • Takes each data point into consideration • Percentages of agreement for the occurrence of target behavior • Kappa index
INTRARATER RELIABILITY • Not in text • Compared with self
RELIABILITY • Manuals often give this information • High reliability doesn’t indicate validity • Generally a longer test has higher reliability • Lessens influence of chance or guessing
FAIRNESS • Reduction or elimination of undue bias • Language • Ethnic or racial backgrounds • Gender • Free of stereotypes & biases • Beginning to be a concern for TR
USABILITY & PRACTICALITY • Nonstatistical • Is this tool better than any other tool on market or one I can design? • Time, cost, staff qualifications, ease of administration, scoring, etc