Constructing Rubrics for Open-ended Activities

Constructing Rubrics for Open-ended Activities Share the Future IV 18 March 2003

Workshop Materials • Printed copies of materials will be passed out today • Also available at the Foundation Coalition web page: www.foundationcoalition.org

Form Groups • This workshop includes two group activities • Form groups of 4 or 5 • Line up in order of the distance you traveled to attend this conference

Workshop Presenters • Susan Haag, Director of Assessment, College of Engineering & Applied Sciences, Arizona State University • Ann Kenimer,Associate Professor, Department of Agricultural Engineering, Texas A&M University

Susan Haag, ASU • Director of Assessment & Evaluation CEAS • Specialist in Instructional Technology and Educational Psychology, ASU • Ph.D., Policy Studies/Psychology, ASU • Develops and instructs course for online delivery—full virtual delivery and web enhanced • Teaches graduate research methods • FC Assessment and Evaluation involvement since 1998

Ann Kenimer, Texas A&M • B.S., M.S., Agricultural Engineering, Virginia Tech • Ph.D., Agricultural Engineering, University of Illinois • Teaches engineering design processes, fundamental problem solving, environmental engineering • FC Assessment and Evaluation involvement since 2000

Workshop Agenda • Rubrics: • Common Terms • What is a Rubric? • How Are Rubrics Used? • Examples of Rubrics • Characteristics of a Rubric • Team Activities: • Use & Evaluate a Rubric • Develop a Rubric • Common Problems and Solutions: • Resources • Wrap Up

Qualitative Assessment Open-ended data -Content analysis -Rubric -Check-list -Inter-rater -Intra-rater “Objective” Assessment Closed-ended data -Forced-choice response Terms Used in Workshop Pre-Determined Criteria Reliability Validity -Theoretical -Face -Criterion

What is a Rubric? (Pre-Determined Criteria) • Definition of Rubric3,9: • A systematic scoring methodology to make qualitative assessment and evaluation more reliable and objective by applying pre-determined criteria e.g., Descriptive criteria are developed to serve as guidelines for scorers to assess, rate and judge student performance

What is a Rubric?(”Open-ended” Data) • It is a tool used in the qualitative assessment of “open-ended” data, such as… • Written or oral narratives • Diagrams or models • Written or oral enumerations • Behavioral demonstrations of a student’s knowledge, applied skill, or ability to perform

How Are Rubrics Used? (“Open-ended” Data) • Advantages and drawbacks of assessing “open ended” data7 Advantages: • Can yield “rich” information (I.e., individual, creative, complex, fine-tuned) Drawbacks: • Involves “subjectivity” in interpreting and scoring data (i.e. the judgments of individuals scoring) as contrasted with “objective” tests • Problems with reliability (both inter-raters and intra-rater, across time)

How Are Rubrics Used? (“Open-ended” Data) • Other methods of Qualitative Assessment used with open-ended data • Content analysis and coding10 • Inventory checklists11 • Rubrics

How Are Rubrics Used? (Diagnostic Feedback) • Descriptions of performance standards may serve to communicate to students what is expected of quality performance5 e.g., Ideal, expected performance described in a rubric can be explicitly compared with individual performance in order to convey what aspects of performance need improvement

Rubric Types Rubrics may be used “holistically” or “analytically”… • “Holistic” Rubric5: • The entire response is evaluated and scored as a single performance category • “Analytical” Rubric5: • The response is evaluated with multiple descriptive criteria for multiple performance categories

Rubric Types: Example • “Holistic”Rubric for Open-Ended Math Problems11 • Criteria for Demonstrated Competence: (6 points) Description of Exemplary Response: • Gives a complete response with a clear, coherent, unambiguous, and elegant explanation; includes a clear and simplified diagram; communicates effectively to the identified audience; shows understanding of the problem’s mathematical ideas and processes; identifies all the important elements of the problem; may include examples and counter-examples; presents strong supporting arguments.

Rubric Types: Example • “Holistic”Rubric for Open-Ended Math Problems… • Criteria for Inadequate Response : (2 points) Description of a Response which Begins, but Fails to Complete Problem: • Explanation is not understandable; diagram may be unclear; shows no understanding of the problem situation; may make major computational errors.

Rubric Types: Example • Analytical Rubric • Scoring rubrics for program objectives • Life-long learning • Impact in a global/societal context • Diana Briedis, Michigan State University

Characteristics of a Rubric (Reliability) A good rubric must posses “reliability” • Definition of Reliability4: • the extent to which the measuring instrument yields responses that are consistent and stable across time (intra-rater) and between different scorers (inter-rater)*

Characteristics of a Rubric (Validity) A good rubric must posses “validity” • Definition of Validity1: • the extent to which what is being measured by an instrument is actually what is intended. Are the test and rubric actually measuring the desired performance-outcomes? (Construct, Criterion and Face Validity)

Team Activity I Evaluate a Rubric

The Sample Rubric • Developed for use in a freshman-level introduction to design class • Used to evaluate oral presentations made by freshman design teams • Used by a panel of 3 to 4 faculty • course instructor • faculty invited for one day to serve on the review panel • panel membership changed over the 5 days of presentation

Your Task • With your group discuss: • The merits of the sample rubric and how it was used • Potential problems with the sample rubric and how it was used • What you might do to improve the rubric and its use • We’ll share ideas in about 15 minutes

Your Ideas • What are your thoughts on: • The merits of the sample rubric and how it was used • Potential problems with the sample rubric and how it was used • What you might do to improve the rubric and its use

Sample Rubric-- Results Student A

Sample Rubric-- Results Student B

Sample Rubric—Changes • Better definitions of what presentation qualities deserve what score • Better training of reviewers • More consistency of panel members from one day to the next

Constructing a Rubric Note: there are two components involved in this assessment and evaluation methodology: • The test instrument given to the students • The scoring rubric used by the evaluators

Constructing a Rubric3,6,9 • Develop appropriate performance goals and objectives 2. Select the assessment tasks that reflect and demonstrate the performance goals 3. Differentiate between performance levels and assign relative values to each of the levels [establish “expert”level; establish target students’ developmental level]

Constructing a Rubric 4. Develop descriptive criteria for each level of performance which correspond with local norms [holistic or analytical] • Train scorers in application of rubric • Pilot both test and scoring rubric [for inter-rater & intra-rater consistency, apply cross checking methods] • Modify test items and scoring rubric based upon scoring results & content analysis of responses

Develop Appropriate Performance Objectives and Tasks: Example5

Team Activity II • Develop a rubric for: • Laboratory report • Engineering design project • We’ll discuss your rubrics in about 20 minutes

Team Activity II • Discussion • What does your rubric contain? • How might you apply this activity to your courses?

Validity Transferability of assessment question interpretation Transferability of specifications for expected performance Changes in curriculum or instruction Changes in performance standards Changes in students’ prior knowledge Common Problems (Transferability & Repeatability ) Transferability and Repeatability of Test Questions and Rubric Criteria • Across similar or different courses • Over time, or across locales • Across populations • Across scorers

Reliability(interacts with validity) Inter-rater Intra-rater(tends to be more validity sensitive) Different scorers Changes in scorers’ knowledge Common Problems(Transferability & Repeatability..cont) Transferability and Repeatability of Test Questions and Rubric Criteria • Across similar or different courses • Over time, or across locales • Across populations • Across scorers

Solutions to Common Problems(Transferability & Repeatability..cont) Validity • Address.. -”Theoretical” validity2-- Review literature & other resources for precedents -”Criterion” validity2– Ask sample of experts, novices (if appropriate) and target population to respond -”Face” validity12-- Ask relevant sample of “local” users to respond and critique • Content-- Analyze responses & compare target population to “local” users, to experts, to novices ( if appropriate), and to rubric criteria

Solutions to Common Problems (Transferability & Repeatability..cont) Validity…cont. • Modify test questions, if necessary, as indicated by discrepancies between response content analysis results of target population and/or local users, and the rubric • Modify rubric criteria or scoring standards, to align with expert content and performance levels; or with local user content and performance levels if these differ from expert results

Solutions to Common Problems (Transferability & Repeatability..cont) Reliability--Train and manage scorers for intra-rater consistency • By having them take the test, then score their own and another scorer’s test, then justify their scoring to a third party • By having them re-view and re-score the 1st test they scored after they have completed scoring their 5th test, and • By having them review and re-score the first 5 tests scored after having completed scoring 10 tests, and continue pattern

Solutions to Common Problems (Transferability & Repeatability..cont) Reliability--Train and manage scorers for inter-rater consistency • By duplicating a sampling of all tests and having all scorers evaluate and score each test • By having all scorers re-view each other’s scoring of this common set of test, having them discuss discrepancies, arrive at consensus on interpretation and application of rubric criteria and having them jointly re-score discrepant tests • By having all scorers periodically and repeatedly review, each other’s scored tests, individually re-score them, then discuss, and jointly re-score two tests

Solutions to Common Problems (Transferability & Repeatability..cont) Reliability – Controls • Halfway through the scoring job, have an outsider sample each scorer’s scored tests, and have each scorer justify his/her scoring of the same items across several tests • Report both intra-rater inconsistencies and inter-rater inconsistencies noted to scorers for their correction • Repeat process near end of scoring job • Also calculate and examine inter-rater and intra-rater consistency rates by test subject, and by test item; as well as inter- item correlations 8

ResourcesCitation References • Bergeson, Dr. Terry. Office of Superintendent of Public Instruction web page. “Scoring the WASL Open-Ended Items” 1998. 1 May 2002 <http://www.k12.wa.us/assessment/assessproginfo/subdocuments/TechReports/g4part4.pdf> • Cronbach, Lee J., Meehl, Paul E. “Construct Validity in Psychological Tests.” Psychological Bulletin (1955). 11 June 2002. http://psychclassics.yorku.ca/Cronbach/fl • Ebert-May, Diane. “Classroom Assessment Techniques: Scoring Rubrics.” Field-tested Learning Assessment Guide (FLAG) web site 1999. 11 June 2002 <http://www.flaguide.org/cat/rubrics/rubrics1.htm> • Graduate School of Education & Information Studies. CRESST. UCLA <http://www/Rubrics/CRESSTUCLAassementglossary.html>

ResourcesCitation References • Davis D.C., Gentili K.L., Calkins D.E., Trevisan M.S. ‘Transferable Integrated Design Engineering Education (TIDEE) Project." October 1998. 29 May 2002. http://www.cea.wsu.edu/TIDEE/monograph.html • Moskal, Barbara M. “Scoring rubrics: what, when and how?” Practical Assessment, Research & Evaluation. (2000). 1 May 2002. <http://ericae.net/pare/getvn.asp?v=7&n=3> • Rowntree, Derek. Home Page. “Designing an assessment” June 2000. 11 June 2002 <http://iet.open.ac.uk/pp/D.G.F.Rowntree/derek.html> • Rudner, Lawrence M. “Reducing Errors due to the Use of Judges.” ED355254 ERIC/TM Digest (1992). 11 June 2002 <http://ericae.net/db/edo/ED355254.htm>

ResourcesCitation References • Seattle School District. “What is a rubric” (2000). 1 May 2002. <http://ttt.ssd.k12.wa.us/dwighth/rubricclass.htm> • Stemler, Steve. “An overview of content analysis.” Practical Assessment, Research, & Evaluation (2001). 11 June 2002. <http://ericae.net/pare/getvn.asp> • Summer Technology Institute at Western Washington University. “Rubric for Open-Ended Math Problems.” California CAP Math Report (1989). 11 June 2002. <http://ttt.ssd.k12.wa.us/dwighth/rubricclass.htm> • Trochim, William M.K. “Measurement Validity Types.” William M.K. Trochim Cornell University Home Page (2002). 11 June 2002. http://trochim.human.cornell.edu/kb

Wrap Up • Please complete the workshop evaluation forms • Thank you!

Constructing Rubrics for Open-ended Activities