1 / 18

Computer Assisted Assessment

Computer Assisted Assessment. Hugh Davis Learning Societies Lab ECS The University of Southampton, UK www.ecs.soton.ac.uk/~hcd. The Research Questions. What are the advantages and disadvantages of assessment by computer? Can higher order skills be assessed by computers?

kblanco
Download Presentation

Computer Assisted Assessment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Assisted Assessment Hugh Davis Learning Societies Lab ECS The University of Southampton, UK www.ecs.soton.ac.uk/~hcd

  2. The Research Questions • What are the advantages and disadvantages of assessment by computer? • Can higher order skills be assessed by computers? • Can essays be marked by computers? • Which objective question types suit which types of learning outcome?

  3. What’s the purpose of Assessment? • Summative Assessment is concerned with getting grades/marks • Criteria referenced: To know whether a person has achieved learning outcomes • Might be in affective, cognitive of psychomotor domains • Norm referenced: To know how good someone is at something (compared to some norm) • Statements like “75% of all candidates should pass X” (which implies 25% will fail regardless of whether they achieved learning outcomes) imply a normative approach • Formative Assessment is concerned with giving feedback • E.g comments on how to improve an essay • Tips on how to solve some problem • Model answer • Group discussion • Diagnostic Assessment is concerned with identifying areas of weakness and gaps • Might be used to identify suitable remedial work &/or as a bar vs progression

  4. Principles of Assessment • Formative Assessment (feedback/self assessment) should be • Open: allow students to control their own leaning • Incremental: continuous – not just at end of course • Demanding: should stretch the student. Neither too trivial or too hard • Timely: feedback needs to be while student still remembers the problem • Summative Assessment (marked exam/CW) must be: • Valid: assess what is meant to be assessed • Reliable: all marking done to the same criteria and standards. • Transparent: alignment with Learning Objectives. No surprises. • Fair: all have equal opportunities to succeed. Gives marks to what the student does know. • Equitable: should not discriminate against any student. • Redeemable: Opportunity to put mistakes right • Efficient: Effective use of staff and student time

  5. Objective Testing • An objective test is one in which scoring procedures do not depend on human judgment so can be marked by a machine. • But the question is only as objective as designed by the author • In CAA we are perhaps most aware of “Multiple Choice” questions • There are various other question types embodied by test engines such as Question Mark Perception, Hot Potatoes, Respondus, and the inbuilt quiz engines in VLEs. • Also in the XML specification “Question and Test Interoperability” – QTI 2.0 • The examples that follow are taken from the CAA centre’s “Guide to Objective Test Design” at • http://www.caacentre.ac.uk/resources/objective_tests/index.shtml • You are strongly advised to read this

  6. The Anatomy of an MCQ

  7. Multiple True/False Q. A 28 year old woman with one child has taken anti-thyroid drugs for 6 months for thyrotoxicosis. She has a friend who has been successfully treated with radio-iodine. She finds she frequently forgets to take her drugs and wants to stop them to have radio-iodine treatment. • She should be told that because of her age radio-iodine is best avoided. • The problems associated with radio-iodine should be discussed with her. • Surgery as a possible alternative should be discussed with her. • She should be advised that some form of further treatment is required. • You should find out more about her friend's treatment. The Correct answer: true, B, C and D: false, A and E. How should we mark this?

  8. Assertion/Reason The question consists of an assertion and a reason. Indicate your answer from the alternatives below by circling the appropriate letter.  Assertion Reason  A True True Reason is correct explanation B True True Reason is NOT a correct explanation C True False  D False True  E False False Assertion: It is difficult to assess higher order skills with a computer Reason: It is easier to select the correct answer from a list in an MCQ than it is to recall the correct answer in an open question. Correct answer B

  9. Multiple Response • Actually little different from Multiple True/False Q. Which of the following are reasons why it is difficult to assess higher order skills using objective questions? • It is easier to select the correct answer from a list in an MCQ than it is to recall the correct answer in an open question • To assess synthesis requires that students demonstrate the creative process, rather than simply identifying a suitable design from a list • Objective tests cannot assess mathematical analytic skills • Learning outcomes demonstrating higher order skills are always a matter of opinion – subjective • There is always a good chance the student can guess the correct answer • In order to demonstrate evaluation skills the student needs to show understanding of how to go about the evaluation rather than selecting a suitable evaluation from a list Again – how do we mark this?

  10. Column 1 Region of high pressure, calm, and light winds The belt of calm air nearest the equator. A wind belt in the northern hemisphere typified by a continual drying wind. Most of the United States is found in this belt. Column 2 Doldrums Horse latitudes Polar easterlies Prevailing easterlies Prevailing westerlies Matching Questions Directions: Column I contains descriptions of geographic characteristics of wind belts. For each statement find the appropriate wind belt in Column II. Answers may be used more than once. Note: An example which asks students to read a passage of text (column 1) and match it to the appropriate literary style in column 2 is definitely assessing evaluation.

  11. Text/Numerical Response/Match Questions • Tony Blair is the leader of the ____________ party. • 10 + 2 / 3 = __________ • Have the advantage that the student must supply the answer, rather than select it from a list. • But what about spelling/ synonyms (labor, socialist)? • What about level of accuracy (10.6666666667)?

  12. Hotspots • Given this map a Italian wine regions, click on Campagnia Can be more difficult if the regions are not delineated – but also more difficult to mark?

  13. Important Features of an Assessment Engine • WYSIWIG question authoring tool • Question banking (QTI?) • Test authoring tool (Adaptive branching/ randomization?) • Test Scheduling Tool • Test Delivery Engine (with feedback? Security?) • Reporting • Results Analysis/ item analysis

  14. Marking Free Text – The Vector Space Model Grade A Eg Grade B Eg (simplified) • Co-occurrence matrix built • Columns are example documents given this grade by skilled markers • rows are words or short phrases used in example documents (stemmed and after stop list removal) • The Essay to be marked is also stemmed and stop list words removed – forms one column same shape as other columns in table • The column representing the essay to be marked is compared to the example essays. • There are various algorithms for this comparison – but basically the grade given is the “best match” So the essay is being graded on the similarity of its use of words and phrases to pre-graded essays

  15. Comments on Vector Space Model • It works pretty well. Systems such as e-rater have been substantially evaluated. • In this form it does not cope with synonyms and polysemy • An essay about “German Shepherds” will not compare well with examples that had used “Alsatians” • An essay on fruit (apples) might compare well with one about computers.. • It is difficult to give feedback – except by giving the same feedback as was given by human raters to similar essays • It requires careful marking of a bank of exemplar questions – so only useful for large scale marking • It does not understand the semantics • Cannot tell the difference between “The Germans bombed the British” and “The British bombed the Germans” • Teachers hate it! (Why?)

  16. Marking Free Text: Latent Semantic Analysis (LSA) • LSA uses an improved version of the Vector Space Model • In effect it also keeps information about the order/proximity of words • This information can be used to spot synonyms, polysemy and to ensure better semantic understanding • It can be shown to require far fewer training essays (only 1?) • It is computationally intensive • Fine tuning the algorithm can be time consuming

  17. Marking Free Text : Natural Language Programming (NLP) • NLP is a branch of AI which uses (principally) rule based approaches to extract “meaning” from bits of documents • This approach can be used to “match” statements that would be expected in the text • Marking and Feedback rules can be associated with each match found (or not found) • Best used for short answers (see UCLES work) • Hybrid systems combine LSA and NLP and get the best of both worlds

  18. Questions to Consider for Before Next Week Communication, Collaboration • What are the pros and cons of enforcing the use of on-line communication tools in an otherwise face-to-face class? Mobile Learning • On the grid headed Mobile Learning (?), add another example to each cell. Assessment • How would you retrieve suitable items from a question bank? • Why do teachers hate LSA style marking tools? • Invent one question, in the IT/CS area which assesses higher order skills. Justify your claim. • What other ways can computers be used in assessment?

More Related