270 likes | 444 Views
Assessment Training. Nebo School District. Assessment Literacy. Test Acronyms. CRT - Criterion Referenced Test 1-11 IOWA –Iowa Test of Basic Skills and Iowa Test of Educational Development 3,5,8,&11 UBSCT - Utah Basic Skills Competency Test 10-12 DWA - Direct Writing Assessment 6&9
E N D
Assessment Training Nebo School District
Test Acronyms • CRT -Criterion Referenced Test1-11 • IOWA –Iowa Test of Basic Skills and Iowa Test of Educational Development3,5,8,&11 • UBSCT -Utah Basic Skills Competency Test10-12 • DWA -Direct Writing Assessment 6&9 • UAA –Utah Alternate Assessment1-12 with severe cognitive disabilities • UALPA -The Utah Academic Language Proficiency Assessment 1-12 ELL
Norm-Referenced Tests • Standardized Tests • Scores interpreted in terms of comparison to a specific group • Percentile scores are most common measurement of achievement • Percentile scores range from 1st to 99th with the 50th percentile being used to represent the national average • ITBS and ITED (IOWA) tests are the state adopted Norm-Referenced Assessments
Criterion-Referenced Tests • Standardized Tests • Every question/item is aligned to an explicitly stated educational objective • Used to identify which standards and objectives have been mastered by the examinee • CRT or End-of-Level tests in Language Arts, Math, and Science
Summative Assessment • Used to determine the students’ final understanding of material • State CRT tests are an example
Formative Assessment • Used to identify the students’ understanding of material, to provide feedback for teachers and learning experiences for students • Benchmarks, UTIPS, Running Records, and Student Interviews are all included in this category
Raw score • The number of correct responses on a test • A student answered 48 questions correctly
Percent Correct Score • The number of correct responses divided by the total number if items • 49 out of 70 = 70%
Percentile Score • The percent of students who performed worse on a test • 75th percentile – 75% of examinees scored lower on the test than this examinee
Scaled Score • The students performance is based on an arbitrary numerical scale (can be alphabetical) • A scaled score correctly provides comparable information on student performance for different years on different tests
ACT • What is 36? • What is 28? • What is 12? • These numbers represent the value we place on numbers in a scale • Often we have the help of others such as colleges in setting value • Utah State University and University of Utah say you must have at least a score of 18
Scaled Scores • Act Scores range from 10-3618-28 is considered proficient depending on school • Advanced Placement tests range from 1-5 3 is proficient • UBSCT and CRT range from 100-200 160 is proficient
Scaled Scores • Scaled scores offer the advantage of simplifying the reporting of results • There can be common score reporting for each level and for each test • No more specific percentages for cut scores for each subject • Far greater comparability between tests and years
Scaled Scores • CRTs and UBSCT use a cut score of 160 • Each proficiency level has its own cut score • Proficiency levels range from 1-4 in NCLB and 1a-4 in UPASS (We will discuss this in the next session)
Example • If john has a raw score of 65 in 2004, and a raw score of 58 in 2005,does this show a decrease in performance? • If john has a scaled score of 165 in 2004, and a scaled score of 155 in 2005, does this show a decrease in performance?
Why Not Raw Scores • Most states do not release raw scores • Looking at raw scores can lead to an incorrect assumption • It is incorrect to compare raw scores from one year to those of the next • It is incorrect to compare the raw scores of one test to those of another
EQUATING Career Home Runs
Who Is The Greatest? • Individually Ability Strength Skill Technique Knowledge • Difficulty of the game Tightly Wound Baseballs Improved Bats Higher Pitchers Mound Changes in Season Length Steroids .
Comparisons • Impossible to compare Barry Bonds with Babe Ruth • Impossible to compare a game in 1914 to a game in 2006
Comparisons • Possible to compare johns ability on the 2005 language arts CRT with johns ability on the 2006 language arts CRT (Scaling) • Possible to compare the difficulty of the 2005 language arts CRT to the 2006 CRT(Equating)
Equating • Statistical process that takes different tests and makes them equal in difficulty • Disentangles differences between test difficulty and student ability
Equating • Common (anchor) items between test forms • Statistical comparison of common items for equivalent difficulty level • This statistical process ensures that results from test to test are accurately comparable and not subject to fluctuations due to unintentional changes in item difficulty
Equating Form X Form Y Anchor Items Anchor Items
Anchor Items • It is the performance of the two sets of anchor items across years that allow us to make interpretations about the relative difficulty of the non-anchor items • If student performance on the anchor items is the same, we conclude that the student achievement is the same • If student performance on the anchor items increases we can interoperate that student achievement increased • If student performance on the anchor items decreases we interoperate that student achievement decreased • We use this information to judge the difficulty of the non-anchor items
Why Equate • One test is more difficult than another • One group of examinees may be more intelligent than another • Both