340 likes | 563 Views
TerraNova Evaluation of a Standardized Test Mini-Project 1. Teresa Frields and Mitzi Hoback . A. General Information. Title: TerraNova Publisher: CTB/McGraw-Hill Date of Publication: 1997. General Information Cost. Varies as to what is purchased
E N D
TerraNovaEvaluation of a Standardized TestMini-Project 1 Teresa Frields and Mitzi Hoback
A. General Information • Title: TerraNova • Publisher: CTB/McGraw-Hill • Date of Publication: 1997
General Information Cost • Varies as to what is purchased • $122 per 30 Complete Battery Plus consumable test booklets • $92.50 per 30 Complete Battery Plus reusable test booklets
A. General Information Administration Time • Varies by test and level • Typically given over a period of several test sessions or days • Fall, Winter, and Spring testing periods available
B. Brief Description of Purpose and Nature of TestGeneral Purpose of Test • Constructed as a “comprehensive modular assessment series” of student achievement • Promoted as a device to help diverse audiences understand student academic achievement and progress • Reports provide useful and informative data which allows for national comparison of group and individual achievement
B. Brief Description of Purpose and Nature of Test Population for which test is applicable • K-12 • Reading/language arts and mathematics available for K-12 • Science and social studies tests available 1-2
B. Brief Description of Purpose and Nature of Test Description of Content • Multiple choice format • Generates precise norm-referenced achievement scores and a full complement of objective mastery scores • Designed to measure concepts, processes, and skills taught throughout the nation • Content areas measured are Reading/Language Arts, Mathematics, Science, and Social Studies
B. Brief Description of Purpose and Nature of Test Appropriateness of Assessment Method • Selected-response items can provide information on basic knowledge and some patterns of reasoning • Does not provide evidence for performance standards/targets • Other TerraNova formats provide a combination of selected-response and constructed-response
Technical EvaluationNorms/Standards • Type – The battery generates precise norm-referenced achievement scores and a full compliment of objective mastery scores. Types of scores provided: • Scaled Scores • Grade Equivalents • National Percentiles • National Stanines • Normal Curve Equivalents Reports are provided both individually and as groups of students.
C. Technical EvaluationNorms/Standards • Standardization Sample – Size: The norming sample was based on a stratified national sample. • 295 schools • Fall & Spring norming studies involved between 860,000 and 1,720,000
C. Technical EvaluationNorms/Standards 2. Standardization Sample – Representativeness: • Separate sampling designs were used for institutions of different types • Public schools stratified by region, community, type, size, & Orshansky Percentile (an indicator of socioeconomic status)
C. Technical EvaluationNorms/Standards Standardization Sample – procedure followed in obtained sample: • Spring Standardization – April, 1996 • Fall Standardization – October 1996 • Recommended test administration period is five week window centered on the norming periods
C. Technical EvaluationNorms/Standards 3. Standardization Sample – Availability of subgoup norms • Questionnaire sent to participating schools • 95% responded in the fall • 100% responded in the spring
C. Technical EvaluationNorms/Standards 3. Standard setting procedures employed – qualifications and selection of judges: • Nominations were made of experienced teachers and curriculum specialists with national reputations • Judges had to possess “deep understanding” of one of the five content areas
C. Technical EvaluationNorms/Standards 3. Standard setting procedures employed – number of judges: • 2 committees for each of 5 content areas • Primary/Elementary and Middle/High School • 4-5 teachers per committee, one curriculum expert (external) and one CTB content expert (approximately 70 people total)
C. Technical EvaluationReliability • Types – Measure of internal consistency: • Kuder-Richardson Formula 20 (KR20) • Item pattern KR20 (a unique measure that takes into account the additional accuracy associated with IRT item-pattern scoring) • Coefficient alpha On individual student score reports, a student’s score is reported along with a confidence band.
C. Technical EvaluationReliability 2. Results: • Reliability coefficients were consistently .80s and .90s • Spelling consistently lower • Grade 1 and 2 also had slightly lower coefficients
C. Technical EvaluationValidity 1. Types – Content-related: • Numerous studies (e.g. classroom pilots, usability, sensitivity) conducted • Advisory panel of teachers, administrators, and content specialists from all parts of country • Based on recommendations of SCANS (Secretary’s Commission of Achieving Necessary skills) report
C. Technical EvaluationValidity • Types – Content-related: • Developers and scorers worked together as constructed-response items were scored for consistency and accuracy of scoring guides and process • Reviewed various informational sources for children to determine topics of interest
C. Technical EvaluationValidity • Types – Criterion-related: • Conducted variety of research studies, such as correlation with SAT and ACT, NAEP, TIMMS
C. Technical EvaluationValidity 1. Types – Construct-related: • Careful test development process to support content validity and comprehensiveness of test • Construct validity for skills, concepts and processes measured in each subject
C. Technical EvaluationValidity 2. Results: • Provides achievement scores that are valid for several types of educational decision making • A thorough validity evaluation encompassed content-, criterion-, and construct-related evidence
Bias Used the following procedures to reduce the amount of bias: • Ensured valid test plan • Followed stringent editorial guidelines • Conducted expert reviews • Analyzed student data for differential item functioning • Selected best items
D. Summary of MMY Reviews • Reviewed by Judith A. Monsaas, Assoc. Prof. Of Education, North Georgia College and State University, Dahlonega, GA • Tests are “very engaging and user friendly”. Materials are well-constructed, and attractive, • Addition of performance standards is helpful for schools moving toward a standards-based curriculum framework
D. Review, continued • Claims to assist in decision making in many areas, including evaluation of student progress, instructional program planning, curriculum analysis, class grouping, etc. This reviewer believes they can support this claim • Has a particularly useful section for parents on “Using Test Results”
D. Review, continued • “Although these tests are attractive and more engaging than most achievement tests I have inspected, I doubt that students will forget that they are taking a test.” • Good section on “Avoiding Misinterpretations” when using grade equivalents is helpful
D. Review, continued • Process used to develop the test and ensure content validity was very thorough and clearly explained • Norming and score reporting methods are well-developed • Reviewer’s only problem is with the mastery classifications for the criterion-referenced interpretations. She feels they are arbitrarily defined.
D. Review continued • Reviewed by Anthony J. Nitko, Professor, Department of Educational Psychology, University of Arizona, Tucson, AZ • One change in the new edition is that items within each subtest are organized according to contextual themes, countering the criticism that standardized tests assess strictly decontextualized knowledge and skills
D. Review Continued • Developers carefully analyzed curriculum guides from around the country, as well as national and state standards and textbook series • Several usability studies were run. The results of these were used to improve test items, teachers’ directions, and page designs
D. Review continued • Earlier editions criticized for problems related to speed. This version corrects those. Typically fewer than 4% of students fail to respond to the last item on each subtest • “One of the better batteries of its type.” • Teachers’ materials exceptionally well-done and informative
E. Critique of the Instrument Our research on the TerraNova helps us to draw the following conclusions: • A complete and comprehensive test • Numerous measures and studies were done to ensure technical requirements • TerraNova takes pride in its overall test design, construction, norming, national standardization process, reliability, validity, and the reduction of bias issues
E. Critique of the Instrument • Does a good job supporting its purpose as a measure to aid in student achievement • Provides three main types of information including norm-referenced information, some criterion information, and standards-based performance information • Serves as a good measure in comparing student achievement with national performances
E. Critique of the Instrument • This is not a test that should be used by itself. It is simply one type of measure and cannot be the only measure used in making critical decisions • When used in conjunction with other test methods and teacher judgment, it is an effective measure for what it purports to do • Caution should be used when using this assessment to track state standards, although it purports to be accurately correlated, there is no substantial proof.
E. Critique of the Instrument Interesting Tidbits: • Del Harnish has done research on bias issues and is published for his work on the TerraNova • Testnote Clarity is a computer program available with the disaggregation of data which allows the user to customize and apply to district curriculum