MEASUREMENT CHARACTERISTICS

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

ERROR & CONFIDENCE • Reducing error • All assessment scores have error • Want to minimize so scores are accurate • Protocols & periodic staff training/retraining • Increasing confidence • Results lead to correct placement • Assessments that produce valid, reliable, and usable results

ASSESSMENT RESULTS • Norm-referenced • Individual’s score compared to others in their peer/norm group • School tests, 95% • Norm group needs to be representative of test takers the test was designed for

ASSESSMENT RESULTS • Criterion-referenced • Individual’s score compared to a preset standard or criterion • Standard doesn’t change based on the individual or group • A=250-295 points

VALIDITY • Describes how well the assessment results match their intended purpose • Are you measuring what you think you are measuring? • Relationship between program & assessment content • Does not have validity for all purposes, populations or time

VALIDITY • Depends on different types of evidence • Is a matter of degree (no tool is perfect) • Is a unitary concept • Change from past • Former types are now considered as evidence • Content validity/content-related evidence

FACE VALIDITY • Not listed in text • Do the items seem to fit?

CONTENT VALIDITY(Content-related evidence) • How well does assessment measure subject or content? • Representative • Completeness----all major areas • Nonstatistical • Review of literature or expert opinion • Blueprint of major components • Per Austin (1991), minimum requirement for any assessment

CRITERION-RELATED VALIDITY (Criterion-related evidence) • Comparison of results • Statistical • Reported as validity or correlation coefficient • +1 to -1 (1 is a perfect relationship) • 0 = no relationship • r.73 better than r.52 • r +/-.40 to +/-.70 = acceptable range

CRITERION-RELATED VALIDITY (Criterion-related evidence) • May use .30 to .40 if statistically significant • If validity is reported, it is generally criterion-related validity • 2 types • Predictive • Concurrent

PREDICTIVE VALIDITY • The ability of an assessment to predict future behaviors or outcomes • Measures are taken at different times • ACT or SAT & success in college • Leisure Satisfaction predicts discharge

CONCURRENT VALIDITY • More than one instrument measures the same content • Desire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable

CONSTRUCT VALIDITY(Construct-related evidence) • Theoretical/conceptual • Content & criterion-related validity contribute to construct validity • Research concerning conceptual framework on which assessment is based contribute to construct validity • Not demonstrated in a single project or statistical measure • Few TR have: focus = behavior not construct

CONSTRUCT VALIDITY(Construct-related evidence) • Factor analysis • Convergent validity (what it measures) • Divergent validity (what it doesn’t measure) • Expert panels here too

THREATS TO VALIDITY • Assessment s/b valid for intended use (e.g. research instruments) • Unclear directions • Unclear or ambiguous terms • Items that are at inappropriate level for subjects • Items not related to construct being measured

THREATS TO VALIDITY • Too few items • Too many items • Items with an identifiable pattern of response • Method of administration • Testing conditions • Subjects health, reluctance, attitudes • See Stumbo, 2002, p.41-42

VALIDITY • Can’t get valid results without reliable results, but can get reliable results without valid results • Reliability is a necessary but not sufficient condition for validity • See Stumbo, 2002, p. 54

RELIABLITY • Accuracy or consistency of a measurement • Reproducible results • Statistical in nature • r = between 0 & 1 (with 1 being perfect) • Should not be lower than .80 • Tells what portion of variance is non-error variance • Increases with length of test & spread of scores

STABILITY (Test-retest) • How stable is the assessment? • Assessment not overly influenced by passage of time • Same group assessed 2 times with same instrument & results of the 2 testings are correlated • Are the 2 sets of scores alike? • Time effects (longer, shorter)

EQUIVALENCY (Equivalent forms) • Also known as parallel-form or alternative-form reliability • How closely correlated are 2 or more forms of the same assessment? • 2 forms have been developed and demonstrated to measure the same construct • Forms have similar but not same items • e.g. NCTRC exam • Short & long forms are not equivalent

INTERNAL CONSISTENCY • How closely are items on the assessment related? • Split half • 1st half vs. 2nd half • Odd/even • Matched random subsets • If can’t divide • Cronbach’s alpha • Kuder-Richardson • Spearman-Brown’s formula

INTERRATER RELIABILITY • Percentage of agreements with number of observations • Difference between agreement & accuracy • Raters compared to each other • 80% agreement

INTERRATER RELIABILITY • Simple agreement • Number of agreements & disagreements • Point-to-point agreement • Takes each data point into consideration • Percentages of agreement for the occurrence of target behavior • Kappa index

INTRARATER RELIABILITY • Not in text • Compared with self

RELIABILITY • Manuals often give this information • High reliability doesn’t indicate validity • Generally a longer test has higher reliability • Lessens influence of chance or guessing

FAIRNESS • Reduction or elimination of undue bias • Language • Ethnic or racial backgrounds • Gender • Free of stereotypes & biases • Beginning to be a concern for TR

USABILITY & PRACTICALITY • Nonstatistical • Is this tool better than any other tool on market or one I can design? • Time, cost, staff qualifications, ease of administration, scoring, etc

MEASUREMENT CHARACTERISTICS

MEASUREMENT CHARACTERISTICS

Presentation Transcript

Fourth Lecture Static Characteristics of Measurement Systems ( continued )

Shape Measurement of Transparent Objects using Polarization and Geometrical Characteristics

Fifth Lecture Dynamic Characteristics of Measurement System

Characteristics

Large-Scale Characteristics of 45 GHz Based on Channel Measurement

Characteristics

Characteristics

Characteristics

Measurement-based Analysis of UMTS Link Characteristics

Characteristics

Measurement Characteristics

Characteristics

Characteristics

CHARACTERISTICS

CHARACTERISTICS

Characteristics

Measurement Characteristics of Client Assessment

Ch 2 General characteristics of measurement systems

Characteristics

Characteristics