Characteristics of Successful Assessment Measures

Characteristics of Successful Assessment Measures • Reliable • Valid • Efficient • - Time • - Money • - Resources • Don’t result in complaints

Reliability

What Do We Mean by Reliability? • The extent to which a score from a test is consistent and free from errors of measurement

Methods of Determining Reliability • Test-retest (temporal stability) • Alternate forms (form stability) • Internal reliability (item stability) • Interrater Agreement

Reliability Test-Retest

Test-Retest Reliability • Measures Temporal Stability • Stable measures • Measures expected to vary • Administration • Same participants • Same test • Two testing periods

Test-Retest ReliabilityScoring • To obtain the reliability of an instrument, the scores at time one are correlated with the scores at time two • The higher the correlation the more reliable the test

Test-Retest ReliabilityProblems • Sources of measurement errors: • Characteristic or attribute being measured • may change over time. • - Reactivity • - Carry over effects • Practical problems: • - Time consuming • - Expensive • - Inappropriate for some types of test

Standard Error of Measurement • Provides a range of estimated accuracy • 1 SE = 68% confident • 1.98 SE = 95% confident • The higher the reliability of a test, the lower the standard error of measurement • Formula

ExampleMean = 70, SD = 10

Practice Exercise

Exercise Answers

Serial Killer IQ ExerciseMean = 100, SD = 15, Reliability=.90IQ of 70 for death penalty

Serial Killer IQ - AnswersMean = 100, SD = 15, Reliability=.90IQ of 70 for death penalty

Reliability Alternate Forms

Alternate Forms Reliability • Establishes form stability • Used when there are two or more forms of the same test • Different questions • Same questions, but different order • Different administration or response method (e.g., computer, oral) • Why have alternate forms? • - Prevent cheating • Prevent carry over from people who take a test more than once • GRE or SAT • Promotion exams • Employment tests

Alternate Forms ReliabilityAdministration • Two forms of the same test are developed, and to the highest degree possible, are equivalent in terms of content, response process, and statistical characteristics • One form is administered to examinees, and at some later date, the same examinees take the second form

Alternate Forms ReliabilityCounterbalancing

Alternate Forms ReliabilityScoring • Scores from the first form of test are correlated with scores from the second form • If the scores are highly correlated, the test has form stability

Difference Between Parallel and Equivalent

Alternate Forms ReliabilityDisadvantages • Difficult to develop • Content sampling errors • Time sampling errors

What the Research Shows • Computer vs. Paper-Pencil • Few test score differences • Cognitive ability scores are lower on the computer • for speed tests but not power tests • Item order • - Few differences • Video vs. Paper-Pencil • Little difference in scores • Video reduces adverse impact

Reliability Internal

Internal Reliability • Defines measurement error strictly in terms of consistency or inconsistency in the content of the test • With this form of reliability the test is administered only once and measures item stability

Determining Internal ReliabilitySplit-Half Method • Test items are divided into two equal parts • Scores for the two parts are correlated to get a measure of internal reliability • Need to adjust for smaller number of items • Spearman-Brown prophecy formula: • (2 x split half reliability) ÷ (1 + split-half reliability)

Spearman-Brown Formula (2 x split-half correlation) (1 + split-half correlation) If we have a split-half correlation of .60, the corrected reliability would be: (2 * .60) ÷ (1 + .60) = 1.2 ÷ 1.6 = .75

Spearman-Brown FormulaEstimating the Reliability of a Longer Test L = the number of time longer the new test will be

Example Suppose you have a test with 20 items and it has a reliability of .50. You wonder if using a 60-item test would result in acceptable reliability. = = = Estimated New Reliability = .75

Practice

Practice Answers

Common Methods to Determine Internal Reliability • Cronbach’s Coefficient Alpha • - Used with ratio or interval data. • Kuder-Richardson Formula • Used for test with dichotomous items • yes-no • true-false • right-wrong

Interrater Reliability • Used when human judgment of performance is involved in the selection process • Refers to the degree of agreement between 2 or more raters • 3 common methods used to determine interrater reliability • Percent agreement • Correlation • Cohen’s Kappa

Interrater Reliability MethodsPercent Agreement • Determined by dividing the total number of agreements by the total number of observations • Problems • Exact match? • Very high or very low frequency behaviors can • inflate agreement

Interrater Reliability MethodsCorrelation • Ratings of two judges are correlated • Pearson for interval or ratio data and Spearman for ordinal data (ranks) • Problems • Shows pattern similarity but not similarity of actual ratings

Interrater Reliability MethodsCohen’s Kappa • Allows one to determine not only the level of agreement, but the level that would be determined by chance • A Kappa of .70 or higher is considered acceptable agreement

Forensic Examiner A Forensic Examiner B

Example

Demonstration

Increasing Rater Reliability • Have clear guidelines regarding various levels of performance • Train raters • Practice rating and provide feedback

Scorer Reliability • Allard, Butler, Faust, & Shea (1995) • 53% of hand scored personality tests contained at least one • error • 19% contained enough errors to alter a clinical diagnosis

Validity The degree to which inferences from scores on tests or assessments are justified by the evidence

Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. ... The process of validation involves accumulating evidence to provide a sound scientific basis for the proposed score interpretations. It is the interpretations of test scores required by proposed uses that are evaluated, not the test itself. When test scores are used or interpreted in more than one way, each intended interpretation must be validated. Sources of validity evidence include but not limited to: evidence based on test content, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, evidence based on consequences of testing. Standards for Educational and Psychological Testing (1999)

Common Methods of Determining Validity • Content Validity • Criterion Validity • Construct Validity • Known Group Validity • Face Validity

Validity Content Validity

Content Validity • The extent to which test items sample the content that they are supposed to measure • In industry the appropriate content of a test of test battery is determined by a job analysis • Considerations • The content that is actually in the test • The content that is not in the test • The knowledge and skill needed to answer the question

Test of Logic • Stag is to deer as ___ is to human • Butch is to Sundance as ___ is to Sinatra • Porche is to cars as Gucci is to ____ • Puck is to hockey as ___ is to soccer What is the content of this exam?

Messick (1995)Sources of Invalidity • Construct underrepresentation • Construct-irrelevant variance • Construct-irrelevant difficulty • Construct-irrelevant easiness

Domain Content Test Content

Validity Criterion Validity

Criterion Validity • Criterion validity refers to the extent to which a test score is related to some measure of job performance called a criterion • Established using one of the following research designs: • - Concurrent Validity • - Predictive Validity • - Validity Generalization

Characteristics of Successful Assessment Measures

Characteristics of Successful Assessment Measures

Presentation Transcript

Characteristics of Successful Managers

Assessment of measures

Employability Characteristics of A Successful Worker

3.01 Characteristics of Successful Entrepreneurs

Characteristics of a Successful Learner Data

Direct Measures of Assessment

Employability Characteristics of A Successful Worker

Characteristics of Successful Startups

3.01 Characteristics of Successful Entrepreneurs

Characteristics of Successful Teams

Characteristics of Successful Teams

Successful Measures of Peacekeeping:

Assessment measures

Key Characteristics of Successful Students

Measurement Characteristics of Client Assessment

Characteristics of Successful Entrepreneurs

Characteristics Of A Successful Mobile Application

Characteristics of successful women leaders

5 Characteristics of Successful Salespeople

Key Characteristics of Successful Content Marketing