280 likes | 489 Views
EDPC 5335 Basic Principles of Assessment. Understanding Assessment Scores. Types of instruments. Norm-referenced instruments Comparing individual’s score to the distribution of scores from a norm group Criterion-referenced instruments also known as subject referenced or domain referenced
E N D
EDPC 5335 Basic Principles of Assessment Understanding Assessment Scores
Types of instruments Norm-referenced instruments • Comparing individual’s score to the distribution of scores from a norm group Criterion-referenced instruments • also known as subject referenced or domain referenced • Comparing individual’s score to a pre-established standards or criterion
Developing a Norm-Referenced Instrument • What is the purpose of a norm-referenced instrument? • Position individual’s score in a group • Developing a norm-referenced instrument: • Sampling a group to represent the population. • Administering the instrument to the sample • Generating the distribution of scores from the norm-group.
Challenges of Developing a Norm-Referenced Instrument • Generating a representative sample • Systematic sampling • Stratified random sampling
Stratified Random Sampling 4th graders in El Paso School districts Ethnicity Gender S.E.S.
Developing a criterion-referenced instrument • Identifying the domain of interest • Develop items to adequately assess the domain • Decide on the cutoff score for satisfaction of mastery
Challenges of Developing a Criterion-Referenced Instrument • Adequately assess the domain/subject • Set up the cutoff score
Looking into Individual’s Score • Raw Score • Can you compare raw scores? • How to compare scores from individuals across the nation?
Compare Scores Across Regions? • Percentile score/percentile ranks • Percentage of scores at or under a given score in the norming group • From 1 to 99 • Standard Scores • Z score • M=0, SD=1 • T score • M=50, SD=10 • Stanine score • M=5, SD =?2 • Grade and age equivalent score
P.37 • commons.wikimedia.org/wiki/File:Normal_distri...
A: Z Score B: T score C: Stanine score D: Grade and age equivalent score • 1. ________ Has a mean of 0 and SD of 1. • A • 2. ________ is controversial in application • D
A: Norm-referenced AssessmentB: Criterion-referenced Assessment • 1. ________ need to have a representative sample in its development. • A • 2. _________ is criticized for the setting of cut-off score. • B
EDPC 5335 Basic Principles of Assessment Reliability
Reliability • The degree to which test scores are dependable, consistent, and stable across items of a test, different forms of test, repeated adminstration.
Reliability • It is not about an instrument itself. • It is about the results/scores from an instrument. • The instrument developer provides one or more types reliability estimates. • It can vary for degree of measurement error.
Error of Measurement • Observed score= true score + error
Source of Error of Measurement • Time-sampling error • Difference between same individual’s scores obtain at different times • Content-sampling error • Results from selection of items that inadequately cover the content area. • Inter-rater differences • Other resources of error • Quality of test items • Test length • Test-taker variables • Test administration
Methods of Estimating Reliability • Test-Retest
Methods of Estimating Reliability • Alternative Form/ Parallel Form • Same formats • Same number of items • Same content domains • Same difficulty levels
Methods of Estimating Reliability • Internal Consistency Reliability • Split-Half Reliability
Methods of Estimating Reliability • Inter-rater Reliability • The agreement between raters
Evaluating Reliability Coefficients • How large do reliability coefficients need to be? • Possible range: from 0 to +1. • 1 indicates true score = observed score • .65 or more
Standard Error of Measurement (SEM) • Individual’s observed score Vs. true score • The average of individual’s scores away from the individual’s true score • Similar to “standard deviation”
Confidence Intervals • Observed score= true score + error • The upper end and lower end of individual’s score. • Ex: 95%confidence, 68% of confidence
Increasing Reliability • What do you think? • Increase the number of items. Why so? • More items were believed more likely to assess a domain adequately.
A. Alternative form B. Inter-rater Reliability C. Test-Retest • Same formats, Same number of items, Same content domains, Same difficulty levels • A • The agreement between raters • B