1 / 27

MEASUREMENT CHARACTERISTICS

MEASUREMENT CHARACTERISTICS. Error & Confidence Reliability, Validity, & Usability. ERROR & CONFIDENCE . Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining Increasing confidence

overton
Download Presentation

MEASUREMENT CHARACTERISTICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

  2. ERROR & CONFIDENCE • Reducing error • All assessment scores have error • Want to minimize so scores are accurate • Protocols & periodic staff training/retraining • Increasing confidence • Results lead to correct placement • Assessments that produce valid, reliable, and usable results

  3. ASSESSMENT RESULTS • Norm-referenced • Individual’s score compared to others in their peer/norm group • School tests, 95% • Norm group needs to be representative of test takers the test was designed for

  4. ASSESSMENT RESULTS • Criterion-referenced • Individual’s score compared to a preset standard or criterion • Standard doesn’t change based on the individual or group • A=250-295 points

  5. VALIDITY • Describes how well the assessment results match their intended purpose • Are you measuring what you think you are measuring? • Relationship between program & assessment content • Does not have validity for all purposes, populations or time

  6. VALIDITY • Depends on different types of evidence • Is a matter of degree (no tool is perfect) • Is a unitary concept • Change from past • Former types are now considered as evidence • Content validity/content-related evidence

  7. FACE VALIDITY • Not listed in text • Do the items seem to fit?

  8. CONTENT VALIDITY(Content-related evidence) • How well does assessment measure subject or content? • Representative • Completeness----all major areas • Nonstatistical • Review of literature or expert opinion • Blueprint of major components • Per Austin (1991), minimum requirement for any assessment

  9. CRITERION-RELATED VALIDITY (Criterion-related evidence) • Comparison of results • Statistical • Reported as validity or correlation coefficient • +1 to -1 (1 is a perfect relationship) • 0 = no relationship • r.73 better than r.52 • r +/-.40 to +/-.70 = acceptable range

  10. CRITERION-RELATED VALIDITY (Criterion-related evidence) • May use .30 to .40 if statistically significant • If validity is reported, it is generally criterion-related validity • 2 types • Predictive • Concurrent

  11. PREDICTIVE VALIDITY • The ability of an assessment to predict future behaviors or outcomes • Measures are taken at different times • ACT or SAT & success in college • Leisure Satisfaction predicts discharge

  12. CONCURRENT VALIDITY • More than one instrument measures the same content • Desire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable

  13. CONSTRUCT VALIDITY(Construct-related evidence) • Theoretical/conceptual • Content & criterion-related validity contribute to construct validity • Research concerning conceptual framework on which assessment is based contribute to construct validity • Not demonstrated in a single project or statistical measure • Few TR have: focus = behavior not construct

  14. CONSTRUCT VALIDITY(Construct-related evidence) • Factor analysis • Convergent validity (what it measures) • Divergent validity (what it doesn’t measure) • Expert panels here too

  15. THREATS TO VALIDITY • Assessment s/b valid for intended use (e.g. research instruments) • Unclear directions • Unclear or ambiguous terms • Items that are at inappropriate level for subjects • Items not related to construct being measured

  16. THREATS TO VALIDITY • Too few items • Too many items • Items with an identifiable pattern of response • Method of administration • Testing conditions • Subjects health, reluctance, attitudes • See Stumbo, 2002, p.41-42

  17. VALIDITY • Can’t get valid results without reliable results, but can get reliable results without valid results • Reliability is a necessary but not sufficient condition for validity • See Stumbo, 2002, p. 54

  18. RELIABLITY • Accuracy or consistency of a measurement • Reproducible results • Statistical in nature • r = between 0 & 1 (with 1 being perfect) • Should not be lower than .80 • Tells what portion of variance is non-error variance • Increases with length of test & spread of scores

  19. STABILITY (Test-retest) • How stable is the assessment? • Assessment not overly influenced by passage of time • Same group assessed 2 times with same instrument & results of the 2 testings are correlated • Are the 2 sets of scores alike? • Time effects (longer, shorter)

  20. EQUIVALENCY (Equivalent forms) • Also known as parallel-form or alternative-form reliability • How closely correlated are 2 or more forms of the same assessment? • 2 forms have been developed and demonstrated to measure the same construct • Forms have similar but not same items • e.g. NCTRC exam • Short & long forms are not equivalent

  21. INTERNAL CONSISTENCY • How closely are items on the assessment related? • Split half • 1st half vs. 2nd half • Odd/even • Matched random subsets • If can’t divide • Cronbach’s alpha • Kuder-Richardson • Spearman-Brown’s formula

  22. INTERRATER RELIABILITY • Percentage of agreements with number of observations • Difference between agreement & accuracy • Raters compared to each other • 80% agreement

  23. INTERRATER RELIABILITY • Simple agreement • Number of agreements & disagreements • Point-to-point agreement • Takes each data point into consideration • Percentages of agreement for the occurrence of target behavior • Kappa index

  24. INTRARATER RELIABILITY • Not in text • Compared with self

  25. RELIABILITY • Manuals often give this information • High reliability doesn’t indicate validity • Generally a longer test has higher reliability • Lessens influence of chance or guessing

  26. FAIRNESS • Reduction or elimination of undue bias • Language • Ethnic or racial backgrounds • Gender • Free of stereotypes & biases • Beginning to be a concern for TR

  27. USABILITY & PRACTICALITY • Nonstatistical • Is this tool better than any other tool on market or one I can design? • Time, cost, staff qualifications, ease of administration, scoring, etc

More Related