ROHAYA TALIB FACULTY OF EDUCATION UNIVERSITI TEKNOLOGI MALAYSIA JOHOR BAHRU

FORMULATING ASSESSMENT: DEVELOPING SCORING / MARKING GUIDE 25 JULAI 2018 15 AUGUST 2018 ROHAYA TALIB FACULTY OF EDUCATION UNIVERSITI TEKNOLOGI MALAYSIA JOHOR BAHRU BBA (ACCOUNTING)M.ED (MEASUREMENT & EVALUATION)PhD (MEASUREMENT & EVALUATION)

FORMULATING ASSESSMENT: DEVELOPING SCORING GUIDE • DEVELOPING SCORING / MARKING GUIDE : ANSWER SCHEME, RUBRIC, CHECKLIST,RATING SCALE • ANALYTIC VS HOLISTIC SCORING • INTERPRETATION (NORM VS CRITERION) • WEIGHTING THE COMPONENTS OF A MARK • VALIDITY & RELIABILITY OF SCORE : ERROR (TEST TAKERS, WITHIN THE TEST, TEST ADMINISTRATION, ERROR IN SCORING)

THE FUNCTIONS OF ASSESSMENT Assessments performed at crucial times in the learning process can spell the difference between gathering data to evaluate students and using assessments to enhance learning. Based on timing and purpose, four functions of assessment data are: FORMATIVE ASSESSMENT provides diagnostic feedback to students and instructors at short-term intervals (i.e., during a class or on a weekly basis) SUMMATIVE ASSESSMENT provides a description of students' level of attainment upon completion of an activity, module, or course EVALUATIVE ASSESSMENT provides instructors with curricular feedback (i.e., the value of a field trip or oral presentation technique) EDUCATIVE ASSESSMENT Integrated within learning activities themselves, educative assessment builds student (and faculty) insight and understandings about their own learning and teaching.

WHAT DO WE MEAN BY ASSESSMENT AND EVALUATION?

SCORING TOOLS RATING SCALE RUBRIC CHECKLIST ACHIEVEMENT Cognitive ASSESSMENT SCORES / MARKS PERFORMANCE Cognitive Psychomotor EVALUATION MEASUREMENT TESTING SCORING PROCESS HOLISTIC ANALYTIC

Assessment doesn't have to be a written exam. You can determine if you have successfully learned something in a number of different ways, depending on what you are trying to learn. Recognizing that there are many different ways to assess learning and becoming skillful at self-assessment are important lifelong learning skills.

Assessment Tools: CHECKLISTS, RATING SCALES AND RUBRICS Checklists, rating scales and rubrics are tools that state specific criteria and allow teachers and students to gather information and to make judgements about what students know and can do in relation to the outcomes. They offer systematic ways of collecting data about specific behaviours, knowledge and skills.

The quality of information acquired through the use of checklists, rating scales and rubrics is highly dependent on the quality of the descriptors chosen for assessment. Their benefit is also dependent on students’ direct involvement in the assessment and understanding of the feedback provided. • The purpose of checklists, rating scales and rubrics is to: • provide tools for systematic recording of observations • provide tools for self-assessment • provide samples of criteria for students prior to collecting and evaluating data on their work record the development of specific skills, strategies, attitudes and behaviours necessary for demonstrating learning • clarify students' instructional needs by presenting a record of current accomplishments.

Tips for Developing Checklists, Rating Scales and Rubrics Use checklists, rating scales and rubrics in relation to outcomes and standards. Use simple formats that can be understood by students and that will communicate information about student learning to stakeholders Ensure that the characteristics and descriptors listed are clear, specific and observable. Encourage students to assist with constructing appropriate criteria. For example, what are the descriptors that demonstrate levels of performance in problem solving? Ensure that checklists, rating scales and rubrics are dated to track progress over time. Leave space to record anecdotal notes or comments. Use generic templates that become familiar to students and to which various descriptors can be added quickly, depending on the outcome(s) being assessed. Provide guidance to students to use and create their own checklists, rating scales and rubrics for self-assessment purposes and as guidelines for goal setting.

Checklists usually offer a yes/no format in relation to student demonstration of specific criteria. • They may be used to record observations of an individual, a group or a whole class. • Rating Scales allow teachers to indicate the degree or frequency of the behaviours, skills and strategies displayed by the learner. • Rating scales state the criteria and provide three or four response selections to describe the quality or frequency of student work.

RATING SCALE

WHAT IS A RUBRIC? Rubrics (or "scoring tools") are a way of describing evaluation criteria (or "grading standards") based on the expected outcomes and performances of students. Typically, rubrics are used in scoring or grading written assignments or oral presentations; however, they may be used to score any form of student performance. Each rubric consists of a set of scoring criteria and point values associated with these criteria. In most rubrics the criteria are grouped into categories so the instructor and the student can discriminate among the categories by level of performance. In classroom use, the rubric provides an "objective" external standard against which student performance may be compared

Rubrics use a set of criteria to evaluate a student's performance. • They consist of a fixed measurement scale and detailed description of the characteristics for each level of performance. • These descriptions focus on the quality of the product or performance and NOT thequantity • Rubrics are commonly used to evaluate student performance with the intention of including the result in a grade for reporting purposes. Rubrics can increase the consistency and reliability of scoring. • Rubrics may be used to assess individuals or groups and, as with rating scales, may be compared over time.

"Learning increases when learners have a sense of what they are setting out to learn, a statement of explicit standards they must meet and a way of seeing what they have learned." Loaker, Cromwell and O'Brien (1986) pg.47 Rubrics are a way to make explicitexpectations of what students will need to know and be able to do in order to receive a given grade. Rubrics help instructors to develop clear and attainable learning objectives for their students and if provided to students prior to the activity, serve to guide their efforts.

Scoring rubrics are descriptive scoring schemes that are developed by teachers or other evaluators to guide the analysis of the products or processes of students' efforts(Brookhart, 1999). Scoring rubrics are typically employed when a judgement of quality is required and may be used to evaluate a broad range of subjects and activities Scoring rubrics have also been used to evaluate group activities, extended projects and oral presentations(e.g., Chicago Public Schools,1999; Danielson, 1997a; 1997b; Schrock, 2000; Moskal, 2000). They are equally appropriate to the English, Mathematics and Science classrooms(e.g., Chicago Public Schools, 1999; State of Colorado, 1999; Danielson, 1997a; 1997b; Danielson & Marquez, 1998; Schrock, 2000). Both pre-college and college instructors use scoring rubricsfor classroom evaluation purposes(e.g., State of Colorado, 1999; Schrock, 2000; Moskal, 2000; Knecht, Moskal & Pavelich, 2000). Where and when a scoring rubric is used does not depend on the grade level orsubject, butrather on the purpose of the assessment.

STRUCTURE OF A RUBRIC

Scoring rubrics usually contain the following elements: • Clear statements of the level of knowledgeyou expect the student to achieve for them to receive a given grade. • The dimensions of the quality of workyou expect the student to achieve. • Commentaries describing your expectations of knowledge and qualitythat distinguishes each grade band • Hubba and Freed (2000). Learner-Centered Assessment on College Campuses. Rubrics provide a readily accessible way of communicating and developing our goals with students and the criteria we use to discern how well students have reached them. Diane-Ebert-May Michigan State University

ADVANTAGES OF RUBRIC • Clarity in assessing student’s performance - clarify vague fuzzy statements • i.e. demonstrate effective writing skills • Help students understandexpectation • Helps students self-improve (metacognition) • Make scoring easier and faster • Make scoring accurate (valid), unbiased and consistent(reliable) • Reduce arguments with students • Help improve teaching and learning process

ANALYTIC RUBRIC

HOLISTIC RUBRIC

DEVELOPING RUBRICS AND SCORING CRITERIA The inclusion of rubrics in a teaching resource provides opportunities to describe stages in the development and growth of knowledge, understandings and skills. Rubrics allow students to see the progression of mastery in the development of understandings and skills.

DEVELOPING RUBRICS AND SCORING CRITERIA (CONT…) A good start is to define what quality work looks like based on the learning outcomes. Exemplars of achievement need to be used to demonstrate to students what an excellent or acceptable performance is. This provides a collection of quality work for students to use as reference points. Once the standard is established, it is easy to define what exemplary levels and less-than-satisfactory levels of performance look like. The best rubrics have three to five descriptive levels to allow for discrimination in the evaluation of the product or task.

WHEN DEVELOPING A RUBRIC, CONSIDER THE FOLLOWING: • What are the specific outcomes in the task? • Do the students have some experience with this or a similar task? • What does an excellent performance look like? What are the qualities that distinguish an excellent response from other levels? • Is each description qualitatively different from the others? Are there an equal number of descriptors at each level of quality? Are the differences clear and understandable to students and others?

WHEN DEVELOPING THE SCORING CRITERIA AND QUALITY LEVELS OF A RUBRIC, CONSIDER THE FOLLOWING GUIDELINES. • LEVEL 2is the Meets acceptable standard. This level should indicate minimal competencies acceptable to meet grade level expectations. Performance and understanding are emerging or developing but there are some errors and mastery is not thorough. This is a "On the right track, but …". • LEVEL 1 Does not yet meet acceptable standard. This level indicates what is not adequate for grade level expectations and indicates that the student has serious errors, omissions or misconceptions. This is a "No, but …". The teacher needs to make decisions about appropriate intervention to help the student improve. • http://www.learnalberta.ca/content/mewa/html/assessment/checklists.html

WHEN DEVELOPING THE SCORING CRITERIA AND QUALITY LEVELS OF A RUBRIC, CONSIDER THE FOLLOWING GUIDELINES. • LEVEL 4is the Standard of excellence level. Descriptions should indicate that all aspects of work exceed grade level expectations and show exemplary performance or understanding. This is a "Wow!“ • LEVEL 3is the Approaching standard of excellence level. Descriptions should indicate some aspects of work that exceed grade level expectations and demonstrate solid performance or understanding. This is a "Yes!“ • LEVEL 2is the Meets acceptable standard. This level should indicate minimal competencies acceptable to meet grade level expectations. Performance and understanding are emerging or developing but there are some errors and mastery is not thorough. This is a "On the right track, but …". • LEVEL 1 Does not yet meet acceptable standard. This level indicates what is not adequate for grade level expectations and indicates that the student has serious errors, omissions or misconceptions. This is a "No, but …". The teacher needs to make decisions about appropriate intervention to help the student improve. • http://www.learnalberta.ca/content/mewa/html/assessment/checklists.html

INTERPRETATION OF SCORES NORM OR CRITERION REFERENCED? NORM-REFERENCED ASSESSMENT (NRA) is a type of test that assesses the test taker’s ability and performance against other test takers. It could also include a group of test takers against another group of test takers. This is done to differentiate high and low achievers. The test’s content covers a broad area of topics that the test takers are expected to know and the difficulty of the content varies. This test must also be administered in a standardized format. Norm-referenced test helps determine the position of the test taker in a predefined population.

INTERPRETATION OF SCORES NORM OR CRITERIONREFERENCED? CRITERION-REFERENCED ASSESSMENT (CRA)is a type of test that assesses the test taker’s ability to understand a set curriculum. In this test, a curriculum is set in the beginning of the class, which is then explained by the instructor. At the end of the lesson, the test is used to determine how much did the test taker understand. This test is commonly used to measure the level of understanding of a test taker before and after an instruction is given.

INTERPRETATION OF SCORES NORM OR CRITERIONREFERENCED? It can also be used to determine how good the instructor is at teaching the students. The teacher or the instructor sets the test according to the curriculum that was presented. Examples of Criterion-Reference tests include the tests that are given in schools and colleges in classes by a teacher. This helps the teacher determine if the student should pass the class.

WEIGHTING THE COMPONENTS OF A SCORE The purpose of scoring / giving score is to provide FEEDBACK ABOUT STUDENTS’ ACHIEVEMENT. However, SUCH FEEDBACK IS BENEFICIAL ONLY IF IT IS ACCURATE! To attain this goal, each component of a final score (test/ quizzes, project, article review, lab report etc) should affect final score ONLY to the appropriate extent. Thus, weighting of components is an important step in arriving at an accurate, fair and just final score, otherwise, feedback provided about student achievement through final score will be DISTORTED.

Valid Instrument Scoring Scheme GOOD MEASUREMENT PRACTICE Construct : Achievement/ Ability / Mastery /Competency SCORE Nonconstruct : Attendance, Conduct, Appearance

Final Exam 40% Project 20% Team Working 10% Conference Paper 20% Ethics 10% SCORE

Final Exam 40% TECHNICAL (80%) PO1: AKW PO2: RS PO3: CTPS PG COURSE Project 20% Team Working 10% GENERIC SKILLS (20%) PO4: EM PO5: CS P06: LLL PO7: SS PO8: TS PO9: LS PO10: IM PO11: ES Conference Paper 20% Ethics 10% SCORE

VALIDITY & RELIABILITY OF SCORE : ERROR • TEST TAKERS • WITHIN THE TEST • TEST ADMINISTRATION • ERROR IN SCORING

VALIDITY & RELIABILITY OF SCORE : ERROR

RANDOM SYSTEMATIC VALIDITY RELIABILITY BIAS CHANCE LOW VALIDITY LOW RELIABILITY

SOURCE OF ERRORS 1. ERROR WITHIN TEST TAKERS • Also called as intra-individual error • Factors that change unpredictably over time, as a result impair the consistency and accuracy in measurement • i.e. fatigue, illness, seeing another student’s answer 2. ERROR WITHIN THE TEST Also called as intratest/ within-test error, technical error Poorly designed test/ poorly written items/ test (clues, leading Qs) i.e. trick Qs, reading level too high, ambiguous Qs, Items too difficult

SOURCE OF ERRORS 3. ERROR IN TEST ADMINISTRATION • i.e. misreading the amount of time • Physical comfort : room temperature, humidity, lighting, noise, seating arrangement • Instructions and Explanations: Different test administrators provide differing amount of information • i.e. some give clues, some remain fairly distant • Caused by physical, verbal, attitudinal variables 4. ERROR IN SCORING Computer scoring vs manual hand scoring (clerical errors) Essay Qs – different interpretation, bias, All these four sources of errors affect the reliability coefficient - consistency in measurement

SCORE RELIABILITY SEM The SEM is a function of both the standard deviation of observed scores (SD) and the reliability of the test (r). When the test is perfectly reliable (r=1), the SEM equals 0.

INCREASE INCREASE VALIDITY RELIABILITY SYSTEMATIC RANDOM STANDARD ERROR OF MEASUREMENT The error variance, which indicates the amount of variability in a test administered to a group that is caused by measurement error. The SEM is used to determine the effect of measurement error on individual results BIAS CHANCE

THE ISSUE OF TEST FAIRNESS IS CRITICAL FOR ANY ASSESSMENT PROGRAM • In testing and assessment, professionals have often equated FAIRNESS with an ABSENCE OF BIAS. • CONCEPTIONS OF FAIRNESS • ABSENCE OF BIAS (a test is use as fair if predictions of performance are comparable for different racial / ethnic group) • PROCEDURAL FAIRNESS (focus on equity in treatment of person from different group in the assessment process) • OPPORTUNITY TO LEARN (students should be provided with an adequate/ equal opportunity to learn the material that is assessed) • EQUALITY OF RESULT (i.e. the test/ assignment is considered fair only if the average performance of different group – male/female, Chinese/indian/malay) are almost the same VALIDITY

MAIN REFERENCES : • Kubiszyn, T. and Borich, G. (2007). Educational Testing and Measurement : Classroom Application and Practice. Hobokan NJ : John Wiley & Sons • Linn, R. and Miller, M.D. (2005). Measurement and Assessment in Teaching (9th Ed.). Upper Saddle River, New Jersey : Pearson Merill Prentice Hall

ROHAYA TALIB FACULTY OF EDUCATION UNIVERSITI TEKNOLOGI MALAYSIA JOHOR BAHRU

ROHAYA TALIB FACULTY OF EDUCATION UNIVERSITI TEKNOLOGI MALAYSIA JOHOR BAHRU

Presentation Transcript

UNIVERSITI PUTRA MALAYSIA

Dr. Asrul Izam Azmi Faculty of Electrical Engineering Universiti Teknologi Malaysia

Faculty of Electrical Engineering Universiti Teknologi Malaysia

Dr. Asrul Izam Azmi Faculty of Electrical Engineering Universiti Teknologi Malaysia

Dr. Asrul Izam Azmi Faculty of Electrical Engineering Universiti Teknologi Malaysia

Universiti Kebangsaan Malaysia Faculty of Engineering

HOTEL SELESA, JOHOR BAHRU

UNIVERSITI TEKNOLOGI MARA

HO CHIN SIONG Universiti Teknologi Malaysia, Malaysia

Universiti Sains Malaysia

UNIVERSITI TEKNOLOGI MALAYSIA

Universiti Malaysia Perlis

Johor Bahru Printing for Capturing Attention