330 likes | 510 Views
THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR A COMPUTER-ADAPTIVE TEST. University of Hertfordshire School of Computer Science Mariana Lilley Dr. Trevor Barker Dr. Carol Britton. Objectives.
E N D
THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR A COMPUTER-ADAPTIVE TEST University of Hertfordshire School of Computer Science Mariana Lilley Dr. Trevor Barker Dr. Carol Britton
Objectives • Overview of ongoing research at the University of Hertfordshire on the use of computer-adaptive tests (CATs) • Our approach to the generation of automated feedback • Student attitude • Future work
Research overview • Research started in 2001. • Five empirical studies, involving over 350 participants. • Findings suggest that computer-adaptive test (CAT) approach has the potential to offer a more consistent and accurate measurement of student proficiency levels than the one offered by non-adaptive computer-based tests (CBTs). • Statistical analysis of the data gathered to date suggests that the CAT approach is a fair measure of proficiency levels, producing higher test-retest correlations than either CBT or off-computer assessments. • More importantly, these results were observed in three different subject domains, namely English as a second language, Visual Basic programming and Human-Computer Interaction. This was taken to indicate that the approach can be transferred and generalised to different subject domains.
Traditional and adaptive approaches to testing • Computer-Based Tests (CBTs) mimic aspects of a paper-and-pencil test • Accuracy and speed of marking • Predefined set of questions presented to all participants and thus questions are not tailored for each individual student • Computer-Adaptive Tests (CATs) mimic aspects of an oral interview • Accuracy and speed of marking • Questions are dynamically selected and thus tailored according to student performance
Main benefits of the adaptive approach • Questions that are too easy or too difficult are likely to • Be demotivating • Provide little or no valuable information about student knowledge • Questions at the boundary of student knowledge are likely to • Be challenging • Be motivating • Provide lecturers with valuable information with regard to student ability • “Beginning in the days when education was for the privileged few, the wise tutor would modify the oral examination of a student by judiciously choosing questions appropriate to the student's knowledge and ability” (Wainer, 1990).
Computer-Adaptive Test • Based on Item Response Theory (IRT) • If a student answers a question correctly, the estimate of his/her ability is raised and a more difficult question is presented • If a student answers a question incorrectly, the estimate of his/her ability is lowered and an easier question follows • Can be of fixed or variable length • Score
Item Response Theory • Family of mathematical functions • Most well-known models for dichotomously scored questions: • One-Parameter Logistic Model (1-PL); • Two-Parameter Logistic Model (2-PL); • Three-Parameter Logistic Model (3-PL). • In the CAT application introduced here • 3-PL Model • Fixed length
The 3-PL model from IRT • , represents student's ability • b, represents question's difficulty • a, represents question's discrimination • c, represents pseudo-chance
Level of difficulty • One of the underlying ideas within Bloom's taxonomy of cognitive skills (Anderson & Krathwohl, 2001) is that tasks can be arranged in a hierarchy from less to more complex.
Feedback provided for the first and second assessment sessions • Scores sent via email • Students seemed pleased to receive their scores via email • Some students reported that the score on its own provided learners with very little – if any – help in determining which part of the subject domain they should revise next or which topic they should prioritise • Student views were in line with the opinion of the experts who participated in the pedagogical evaluation of the CAT prototype (Lilley & Barker, 2002)
Feedback provided for the first and second assessment sessions • To: <<Student_Name>> • Your score for the Visual Basic Test 1 was <<Student_Score>>%. • This is an automated message from • The Programming_Module team
Assessment • Bachelor of Science (BSc) in Computer Science • 123 participants • The participants took the test on week 30 as part of their real assessment for the module • 6 non-adaptive questions followed by 14 adaptive ones • Human-Computer Interaction • Issues related to the use of sound at interfaces • Graphical representation at interfaces, focusing on the use of colour and images • User-centred approaches to requirements gathering • Design, prototyping and construction • Usability goals and User experience goals • Evaluation paradigms and techniques
Providing students with a copy of the test • A simple potential solution was to provide students with a copy of all questions they got wrong. • A major limitation of this approach was lack of explanation or comment on their performance. • It seemed unlikely that providing students with the answers to the questions they did not get right would foster research and/or reflection skills. • A further practical limitation of the approach was increased exposure of the objective questions stored in the database. • Re-use of questions is one of the perceived benefits of computer-assisted assessments (Freeman & Lewis, 1998; Harvey & Mogey, 1999).
Automated feedback using Item Response Theory (IRT) • Overall proficiency level calculated as in previous assessments using the CAT application (i.e. using the 3-PL Model). • A proficiency level was calculated for each set of student responses for a given topic. • Questions answered incorrectly by each individual student identified. • Design and implementation of a feedback database: • Feedback according to topic • Feedback according to question
Feedback according to question • Section named “Based on your test performance, we suggest the following areas for revision”. • This section of the feedback document comprised a list of points for revision, based on the questions answered incorrectly by each individual student. • This feedback sentence did not reproduce the question itself. • The feedback sentence listed specific sections within the recommended reading and/or additional learning materials. • The same feedback sentence could be used for more than one question in the database.
Example of feedback sentence related to questions regarding bit-depth • Do some independent research on bit depth (the number of bits per pixel allocated for storing indexed colour information in a graphics file). As a starting point, see http://www.microsoft.com/windowsxp/experiences/glossary_a-g.asp#24-bitcolor. See also Chapter 5 from “Principles of Interactive Multimedia”, as section 5.6.4 introduces important aspects related to the use of colour at interfaces.
Student attitude towards the feedback format adopted • All students who participated in Assessment 3 invited to express their views on the feedback format used (optional). • 58 students replied to our email (47%). • Students asked to classify the feedback received as "very useful", "useful" or "not useful". • Students were also asked to present one positive and one negative aspect of the feedback provided.
Discussion • Like Denton (2003), it is our belief that the potential benefits of automated feedback have not yet been fully explored by academic staff, even by those who are already making use of computer-assisted assessment tools. • Our initial ideas on how CATs/IRT can be used to provide students with personalised, meaningful feedback include: • An ability estimation algorithm based on the Three-Parameter Logistic Model from IRT • A feedback database • Feedback sentences selected from the feedback database based on the ability level estimated and questions answered incorrectly • For each individual student only those sentences that applied to his or her test performance are selected • Selected feedback sentences added to a new Word document and sent to each individual student email account.
Discussion • Learners like to be assessed and value comments on their performance. • The investment of effort by learners necessitates comment from tutors. • As class sizes increase and more use is made of online formative and summative assessment methods, it becomes increasingly difficult to provide individual feedback in HE. • Students still value a human contribution to feedback, but they also realise that this is becoming rarer in their academic lives. • Student attitude to this approach was positive in general. • At the very least we have shown that our automated feedback method identifies areas of weakness and strength and provides useful advice for individual development.
Future work • Creation of one distinct feedback sentence per question. • It is anticipated that these sentences should resemble the actual question more than the current comments do. • "Would it be possible to attach the question and the correct answers from the test?" • Overall layout of the document will be reviewed • To facilitate the location of information on the feedback sheet (some learners reported that they did not intuitively locate their overall score in the feedback document) • The distribution of the feedback document as a PDF rather than Word (DOC) file is also being considered
Future work • Review our assumption in that performance in one topic area within a subject domain is the best indicator of performance in a related topic area in the same domain. • It is possible that students might have differing abilities in quite similar topic areas. • Impact on test length and/or proficiency level estimation. • Increase personalisation of the feedback, we are intending to compare learner performance in previous assessments with his or her performance in most current one.