460 likes | 473 Views
Standardization the properties of objective tests. Properties of Objective Tests. There are three standards by which you can judge an objective test Standardization Reliability Validity. Properties of Objective Tests.
E N D
Properties of Objective Tests • There are three standards by which you can judge an objective test • Standardization • Reliability • Validity
Properties of Objective Tests • Standardization – scoring & use of scores does not vary across situations • Reliability – scores are consistent and remain stable over time • Validity – the test measures what it intends to measure
Standardization Principles • Objective Scoring • Directions • Consistency • Accuracy and timeliness
Standardization Principles • Administration • Appropriate conditions specified • Materials • Probing / Coaching
Standardization Principles • Guidelines for interpretation and use • With whom? • For what purpose? • What do high and low scores mean?
Standardization Principles • Norm tables • Based on large • Representative samples • From a defined population
Standardization Principles • Specialized norm tables • Subgroup differences • For example: age, gender, race, primary language, etc.
Standardization Principles • Raw scores and standard scores provided where appropriate • Standard scores • Percentile ranks • Age standardized scores
Standardization Principles • Technical manual • Test development process • Guidelines for administration, scoring, and interpretation • Norm tables • Meets standards for Ed. & Psych. tests
Norm Tables • Meaningful for interpretation when: • Norm referenced interpretation meets the goal of the test • Not a criterion referenced test
Norm Tables • Meaningful for interpretation when: • Relative position in a group has interpretative meaning • Examinee is a member of the population
Norm Tables • Meaningful for interpretation when: • The norm sample is large and representative of the population • The right norm table is used
Norm Tables • All those taking the test for a given administration may work as a norm sample for an admissions or personnel selection purpose
Norm Tables • However, the correct reference group varies by the purpose • Career counseling • Placement in the appropriate courses • Selection for a remedial program
Interpreting Standard Scores • Raw score is transformed into a standard score • z = (score – mean)/SD • z score = SDs units away from mean • Includes measure of middle and spread
Interpreting Standard Scores • z = 0, average score • z <=-1, low score • z >=1, high score • z is converted to some other scaling: • Mean 50 100 500 • SD 10 15 100
Interpreting Standard Scores • pp. 42,43,48 in book give guidelines • Easiest to use when converted to percentiles • % of population that scores at or below a given score • Can be thought of as a rank out of 100 members of the population
Interpreting Standard Scores • Common interpretation strategies: • Normal range is middle 68% of the population (T=40-60, z=-1 to 1, etc.) • Low and high scores fall outside this range (lower and upper 16%)
Interpreting Standard Scores • Common interpretation strategies: • Normal range is middle 50% of the population (Quartiles 2 & 3) • Low and high scores fall outside this range (Quartiles 1 and 4)
Interpreting Standard Scores • Safer to make broad classification like “Low”, “Within the normal, or expected, range”, or “High” than fine distinctions. • All scores have some measurement error in them. • Look for patterns across the battery, across multiple sources.
An Example from WCCS • Christina, a 1st grade student at our school, took the Stanford Achievement Test last year. Here are her Word Study Skills subtest scores.
Percent Correct • The number of correct responses, or the raw score, is divided by the total number of questions, then multiplied by 100 and expressed as a percentage.
Percent Correct • Christina gave the correct answer to 83.33% of the questions on the Word Study Skills section of the test.
Scaled Score • The raw score is standardized and normalized, then rescaled to the desired scaling. • z = (Raw Score – Mean) / SD • Scaled Score ≈ 500 + (100*z)
Scaled Score • Scaled Scores have many convenient properties from a statistical standpoint. • However, for most people, percentile ranks are easier to understand.
Scaled Score • Christina scored more than one Standard Deviation above average. Her scores are in the above average range.
Percentile Rank • A percentile rank is a statement of the percentage of persons in a given group who fall at or below a given score. • The most common way of reporting test scores and the easiest to use.
Percentile Rank • Christina scored as well or better than 81% of all students in the nation who took this section of the test.
Percentile Rank • Christina scored as well or better than 57% of all students in ACSI schools who took this section of the test.
Percentile Rank • This pattern is typical for our students on average. • ≈ 80th percentile nationally • ≈ 60th percentile for ACSI students • What does this mean?
Stanine • Standard score of nine units • Developed by the military to contain test score information in one column on an IBM punch card • Nine groups (1-9), ½ SD, range of PRs
Stanine • Christina’s scores fall in the 7th stanine, or above average compared to all students nationally. • Christina’s scores fall in the 5th stanine, or average for ACSI students.
Grade Equivalent Scores • Attempt to translate test scores into the grade (grade and month) when the score is typical. • Have an intrinsic appeal. • Are problematic statistically. • Based on extrapolations.
Grade Equivalent Scores • Christina, a 1st grade student at our school, in the area of Word Study Skills, is performing at the level of a typical 3rd grade student in the seventh month of the school year (on the 1st grade test).
An SAT Example • Mark, a 12th grade student at our school, took the SAT test last year. Here are his scores.
An SAT Example • Section mean ≈ 500, SD ≈ 100 • Range = 200-800 (-3z to +3z) • Total mean ≈ 1000, SD ≈ 200 • Range = 400-1600
An SAT Example • Mark scored a 620 on the verbal section of the test. His score was more than one Standard Deviation above the mean and is considered above average.
An SAT Example • Mark’s score on the verbal section of the test was as good or better than 83% of the students who took the test.
An SAT Example • Mark scored a 570 on the quantitative section of the test. His score was within the normal range and is considered average.
An SAT Example • Mark’s score on the quantitative section of the test was as good or better than 66% of the students who took the test.
An SAT Example • Mark scored a 1190 total score and his score was within the normal range and is considered average.
An SAT Example • Mark’s total score was as good or better than 61% of the students who took the test.
General Principles • Tests do not measure innate ability • Test scores result from a combination of: • Innate ability • Environmental influences • Test taker motivation • Properties of the test itself
Cautions about Interpretation • A low score in one norm group may be high in another, and vice versa. • A low score on one test will not necessarily lead to a high score on another test.
Cautions about Interpretation • Interpretation is part art or clinical intuition and experience. • Become familiar with case studies in manuals.