Computer Assisted Assessment

Computer Assisted Assessment Hugh Davis Learning Societies Lab ECS The University of Southampton, UK www.ecs.soton.ac.uk/~hcd

The Research Questions • What are the advantages and disadvantages of assessment by computer? • Can higher order skills be assessed by computers? • Can essays be marked by computers? • Which objective question types suit which types of learning outcome?

What’s the purpose of Assessment? • Summative Assessment is concerned with getting grades/marks • Criteria referenced: To know whether a person has achieved learning outcomes • Might be in affective, cognitive of psychomotor domains • Norm referenced: To know how good someone is at something (compared to some norm) • Statements like “75% of all candidates should pass X” (which implies 25% will fail regardless of whether they achieved learning outcomes) imply a normative approach • Formative Assessment is concerned with giving feedback • E.g comments on how to improve an essay • Tips on how to solve some problem • Model answer • Group discussion • Diagnostic Assessment is concerned with identifying areas of weakness and gaps • Might be used to identify suitable remedial work &/or as a bar vs progression

Principles of Assessment • Formative Assessment (feedback/self assessment) should be • Open: allow students to control their own leaning • Incremental: continuous – not just at end of course • Demanding: should stretch the student. Neither too trivial or too hard • Timely: feedback needs to be while student still remembers the problem • Summative Assessment (marked exam/CW) must be: • Valid: assess what is meant to be assessed • Reliable: all marking done to the same criteria and standards. • Transparent: alignment with Learning Objectives. No surprises. • Fair: all have equal opportunities to succeed. Gives marks to what the student does know. • Equitable: should not discriminate against any student. • Redeemable: Opportunity to put mistakes right • Efficient: Effective use of staff and student time

Objective Testing • An objective test is one in which scoring procedures do not depend on human judgment so can be marked by a machine. • But the question is only as objective as designed by the author • In CAA we are perhaps most aware of “Multiple Choice” questions • There are various other question types embodied by test engines such as Question Mark Perception, Hot Potatoes, Respondus, and the inbuilt quiz engines in VLEs. • Also in the XML specification “Question and Test Interoperability” – QTI 2.0 • The examples that follow are taken from the CAA centre’s “Guide to Objective Test Design” at • http://www.caacentre.ac.uk/resources/objective_tests/index.shtml • You are strongly advised to read this

The Anatomy of an MCQ

Multiple True/False Q. A 28 year old woman with one child has taken anti-thyroid drugs for 6 months for thyrotoxicosis. She has a friend who has been successfully treated with radio-iodine. She finds she frequently forgets to take her drugs and wants to stop them to have radio-iodine treatment. • She should be told that because of her age radio-iodine is best avoided. • The problems associated with radio-iodine should be discussed with her. • Surgery as a possible alternative should be discussed with her. • She should be advised that some form of further treatment is required. • You should find out more about her friend's treatment. The Correct answer: true, B, C and D: false, A and E. How should we mark this?

Assertion/Reason The question consists of an assertion and a reason. Indicate your answer from the alternatives below by circling the appropriate letter. Assertion Reason A True True Reason is correct explanation B True True Reason is NOT a correct explanation C True False D False True E False False Assertion: It is difficult to assess higher order skills with a computer Reason: It is easier to select the correct answer from a list in an MCQ than it is to recall the correct answer in an open question. Correct answer B

Multiple Response • Actually little different from Multiple True/False Q. Which of the following are reasons why it is difficult to assess higher order skills using objective questions? • It is easier to select the correct answer from a list in an MCQ than it is to recall the correct answer in an open question • To assess synthesis requires that students demonstrate the creative process, rather than simply identifying a suitable design from a list • Objective tests cannot assess mathematical analytic skills • Learning outcomes demonstrating higher order skills are always a matter of opinion – subjective • There is always a good chance the student can guess the correct answer • In order to demonstrate evaluation skills the student needs to show understanding of how to go about the evaluation rather than selecting a suitable evaluation from a list Again – how do we mark this?

Column 1 Region of high pressure, calm, and light winds The belt of calm air nearest the equator. A wind belt in the northern hemisphere typified by a continual drying wind. Most of the United States is found in this belt. Column 2 Doldrums Horse latitudes Polar easterlies Prevailing easterlies Prevailing westerlies Matching Questions Directions: Column I contains descriptions of geographic characteristics of wind belts. For each statement find the appropriate wind belt in Column II. Answers may be used more than once. Note: An example which asks students to read a passage of text (column 1) and match it to the appropriate literary style in column 2 is definitely assessing evaluation.

Text/Numerical Response/Match Questions • Tony Blair is the leader of the ____________ party. • 10 + 2 / 3 = __________ • Have the advantage that the student must supply the answer, rather than select it from a list. • But what about spelling/ synonyms (labor, socialist)? • What about level of accuracy (10.6666666667)?

Hotspots • Given this map a Italian wine regions, click on Campagnia Can be more difficult if the regions are not delineated – but also more difficult to mark?

Important Features of an Assessment Engine • WYSIWIG question authoring tool • Question banking (QTI?) • Test authoring tool (Adaptive branching/ randomization?) • Test Scheduling Tool • Test Delivery Engine (with feedback? Security?) • Reporting • Results Analysis/ item analysis

Marking Free Text – The Vector Space Model Grade A Eg Grade B Eg (simplified) • Co-occurrence matrix built • Columns are example documents given this grade by skilled markers • rows are words or short phrases used in example documents (stemmed and after stop list removal) • The Essay to be marked is also stemmed and stop list words removed – forms one column same shape as other columns in table • The column representing the essay to be marked is compared to the example essays. • There are various algorithms for this comparison – but basically the grade given is the “best match” So the essay is being graded on the similarity of its use of words and phrases to pre-graded essays

Comments on Vector Space Model • It works pretty well. Systems such as e-rater have been substantially evaluated. • In this form it does not cope with synonyms and polysemy • An essay about “German Shepherds” will not compare well with examples that had used “Alsatians” • An essay on fruit (apples) might compare well with one about computers.. • It is difficult to give feedback – except by giving the same feedback as was given by human raters to similar essays • It requires careful marking of a bank of exemplar questions – so only useful for large scale marking • It does not understand the semantics • Cannot tell the difference between “The Germans bombed the British” and “The British bombed the Germans” • Teachers hate it! (Why?)

Marking Free Text: Latent Semantic Analysis (LSA) • LSA uses an improved version of the Vector Space Model • In effect it also keeps information about the order/proximity of words • This information can be used to spot synonyms, polysemy and to ensure better semantic understanding • It can be shown to require far fewer training essays (only 1?) • It is computationally intensive • Fine tuning the algorithm can be time consuming

Marking Free Text : Natural Language Programming (NLP) • NLP is a branch of AI which uses (principally) rule based approaches to extract “meaning” from bits of documents • This approach can be used to “match” statements that would be expected in the text • Marking and Feedback rules can be associated with each match found (or not found) • Best used for short answers (see UCLES work) • Hybrid systems combine LSA and NLP and get the best of both worlds

Questions to Consider for Before Next Week Communication, Collaboration • What are the pros and cons of enforcing the use of on-line communication tools in an otherwise face-to-face class? Mobile Learning • On the grid headed Mobile Learning (?), add another example to each cell. Assessment • How would you retrieve suitable items from a question bank? • Why do teachers hate LSA style marking tools? • Invent one question, in the IT/CS area which assesses higher order skills. Justify your claim. • What other ways can computers be used in assessment?

Computer Assisted Assessment

Computer Assisted Assessment

Presentation Transcript

Computer Assisted Learning/Multimedia

Computer Assisted Tactile Graphics

Computer-Assisted Learning

Computer-Assisted Personal Interviewing

Computer Assisted Attention Training

Computer Assisted Assessment for Final Examinations in Moodle

Computer Assisted Assessment within 3D Virtual Worlds

Computer-Assisted Language Learning

A Computer-Assisted Test for Accessible Computer-Assisted Assessment

Computer Assisted Learning [CAL]

Computer Assisted Translation CAT

Computer Assisted Language Learning

CAI – Computer Assisted Instruction

Computer assisted assessment of essays

COMPUTER ASSISTED INSTRUCTION

Introductory Computer Programming, Problem Solving and Computer Assisted Assessment

Computer-assisted essay assessment

Computer-Assisted Assessment (CAA)

Computer- Assisted Assessment

Computer Assisted Coding Software

Computer-assisted essay assessment