390 likes | 589 Views
Survey of Automated Assessment Approaches for Programming Assignments. Gayathri Subramanian. Reference Papers. ‘A Survey of Automated Assessment Approaches for Programming Assignments’ by ‘ Kirsti M. Ala- Mutka ’ (1995 – 2005) .
E N D
Survey of Automated Assessment Approaches for Programming Assignments Gayathri Subramanian
Reference Papers • ‘A Survey of Automated Assessment Approaches for Programming Assignments’ by ‘KirstiM. Ala-Mutka’ (1995 – 2005). • ‘Review of Recent Systems for Automatic Assessment of Programming Assignments’ (2005 – 2010)
Outline • Introduction and Motivation • Static and Dynamic Assessment Techniques • Features of a good Automated Assessment System • Automatic Vs Semi-Automatic Approaches • Summative Vs Formative Approaches • Conclusion
Motivation • Programming Courses are integral part of Computer Science and Software Engineering Curricula. • Proficiency in a programming language is obtained with practice. • Programming Courses are large in size and heavy workload for the teachers. • Even small programs typically have a large number of possible execution paths. • Research suggest that it is not possible to consistently and thoroughly grade students’ programs without automated assistance. • Programs can be automatically assessed !!
Motivation .. Cntd .. • New Automated Systems are being created every year • Many System share common features • Systems exist which satisfy most of the assessment needs • There are far less system that are widely adopted than there are papers about it. • Literature survey helps teacher identify tool they are looking for.
What are the features of a program which can be automatically assessed and the tools which support them ?
Static & Dynamic Assessment of the Code • Programs follow syntax and semantics which makes automatic assessment feasible • Extract some kind of measurement value (justified by teaching goals) from a program and Compare them against the given requirements (or teaching goals) • Some features requires execution of the program , some are statically evaluated • Functionality , Efficiency and Testing Skills are Dynamically Assessed • Coding Style , Programming Errors, Software metrics and Design can be statically assessed
What Features of a Program can be automatically Assessed ? Dynamic Assessment : Functionality Testing Skills Efficiency Language Specific Features
Functionality ( Dynamic Assessment ) • Running the program against test cases [Course Marker, HOGG, BOSS, online Judge] • Success depends on Test Case Design and Model Solution • Coverage Analysis – Function, Statement, Decision measures efficiency of test cases • Correlated Test Cases - defining a test case with a planned relationship to the program state created during previous test input. [Quiver] • There should be certain degree of freedom in representing model solution [Assyst using pattern matching and Course marker uses Reg-exp] • Course Marker checks for the return status of the program.
Functionality .. Cntd • Assess functionality of single function/Method[Quiver for Java , Scheme-Robo for Schema] • Assessing the functionality of a program with a GUI requires a means to monitor actions and responses communicated through the user interface.[JEWL (a language library) for Java provides students GUI and teachers to manage events of program and its output actions]
What Features of a Program can be automatically Assessed ? Dynamic Assessment : Functionality Testing Skills Efficiency Language Specific Features
Testing Skills (Dynamic Assessment) • Testing is an essential phase in program development. • Students submit test data sets along with the programming assignment. • Assyst was the first tool that provided assessment of student test data. The assessment was based on Code Coverage Analysis • Chen (2004) assesses the student test suite by running a set of buggy instructor programs against it. • Webcat - When a student submits a test data set, it assesses how well it covers all the different execution paths.
What Features of a Program can be automatically Assessed ? Dynamic Assessment : Functionality Testing Skills Efficiency Language Specific Features
Efficiency ( Dynamic Assessment ) • A simple efficiency measurement is the running time of the program, measured either by the clock or CPU time used.[Assyst , Online Judge] • Efficiency measurements can be distorted by different implementation of input/output actions. • A simple solution is to offer a common input/ output module for use in assignments.[Hansen and Ruuska] • Efficiency can also be assessed by studying the execution behavior of different structures inside the program. • This is done by calculating how many times certain blocks and statements are executed and by comparing the results to the values obtained from the model solution.[Assyst , Course Marker]
What Features of a Program can be automatically Assessed ? Dynamic Assessment : Functionality Testing Skills Efficiency Language Specific Features
Language Specific Features ( Dynamic Assessment) • Language specific implementation issues can be difficult to learn and assess. • students often misuse memory management, do not deallocateall the reserved memory blocks. • [Tutnew] C++ library which overrides normal memory management methods and thus can provide runtime assessment for program memory usage. • the test cases affect the coverage of this assessment, since they define the execution paths for the program.
What Features of a Program can be automatically Assessed ? Static Assessment : Coding Style Programming Errors Software Metrics Design Language Specific Features
Coding Style ( Static Assessment ) • Programming style or coding style and its connections to readability, maintainability etc. • Typographical - E.g. indentation, placement of parenthesis, maximum length of line • Syntax - every switch-statement should have a default-branch, and each case-branch should end to a break-statement. • Semantic - class names begin with a capital letter and each declared variable should be used in the program. • Logical. Issues related to the logical structure of the program. E.g. there should not be too deeply nested loops, methods should not have a huge number of parameters, and global variables should not be used as method parameters. • Making use of effectiveness of compilers and their warning capabilities • GCC compiler (GCC) can provide feedback on unused variables, implicit type conversions, and language features that are not following the language standards, amongst other things.
Coding Style .. cntd • Checkstyleis open source software for checking Java programs and can be combined to several programming environments. • Comments for classes, attributes and methods • Naming conventions of variables, methods • Number of parameters passed to a function • Duplicated code sections • The good practices of class construction • Complexity Measurements of expressions • Style++ is another tool that has been developed for assessing quality factors from C ++ programs • An automatic system PASS (PASS) has been implemented to assess these issues from programs in Ada, C, and Java languages. http://en.wikipedia.org/wiki/Checkstyle
What Features of a Program can be automatically Assessed ? Static Assessment : Coding Style Programming Errors Software Metrics Design Language Specific Features
Programming Errors ( Static Assessment) • some errors , suspicious code fragments can be recognized statically. • Static check to recognize several typical error types caused by students. Eg, mistakes in updating a loop control variable or inconsistencies between a parameter type and usage. • Xie and Engler (2002), who used code redundancies for detecting errors. By implementing a tool to detect idempotent operations, redundant assignments, dead code, and redundant conditionals, they were able to find several errors from the well known Linux source code.
What Features of a Program can be automatically Assessed ? Static Assessment : Coding Style Programming Errors Software Metrics Design Language Specific Features
Software Metrics ( Static Assessment ) • Software Metric are general metric that characterize the program • Hung, Kwok and Chan (1993) studied different metrics with programming assignments and came to conclusion that the number of code lines was a good measurement of students’ programming skill. • counting different attributes, such as the number of operators and operands in a program , Control Structures • metrics as clear indicators of student performance and also possible indicators of needs for instructional development.
What Features of a Program can be automatically Assessed ? Static Assessment : Coding Style Programming Errors Software Metrics Design Language Specific Features
Design ( Static Assessment ) • Teachers often need to assess whether submitted programs conform to given interface or structural requirements. • Thorburnand Rowe (1997) implemented a system that automatically recognizes the functional structure of a C program. They call it the ‘‘solution plan’’ of the program and compare it to the solution plan of the model program, or to a set of possible plans. • Truong, Roe and Bancroft (2004) implemented a structural similarity analysis that transforms a student’s program to XML presentation and compares it to the set of model solutions. • MacNish (2000) used the Java reflection for analyzing if class interfaces and method signatures in students’ Java programs met the given requirements.
What Features of a Program can be automatically Assessed ? Static Assessment : Coding Style Programming Errors Software Metrics Design Language Specific Features
Language Specific Features • Search for specific key-word based on teaching goal. • In Scheme language to assess whether program structure is purely functional by searching for primitives set!, set-car!, and set-cdr! • A more flexible approach has been implemented in Ceilidh (Foxley, 1999) by defining regular expressions to be searched from the student’s program code.
Essential Features of Automatic Assessment tool for a Programming Course
Automated Administration • AA is a means for administrating submission , grading, general information delivery • Benefits of Automated Administration • Efficient way to track student progress and to Recognize needs for improvement on the course • peer-reviewing becomes feasible. Students comment on each others’ programs.
Plagiarism Detection • Computer programs are text files that are easy to copy. • From Structural Information of the program • MOSS is based on document fingerprinting • JPLAG uses string tokenization with sub-string pattern matching • Attribute Counting Mechanism • Vercoand Wise (1996) compared automated tools based on attribute counting mechanisms
Resubmission Policies • Resubmissions are required for improving the answers. • Resubmission policy should prevent the trail-and-error strategy by some students • Limit the number of submissions • Limit the amount of feedback • Compulsory Time penalty • Making each exercise slightly different • Programming Contest approach [Mooshak] • Combination of limited and unlimited submissions based on test cases
Sand Boxing • Programming assignments are graded by running the code on Server, its important to protect the sever from malicious and unintended code bugs and flaws • Use Existing approaches like Linux security model, chroot, Java Security policy etc to securely run code • Use Static Analysis to filter malicious code • Grading on the client side
Open Sourcing • Survey by Pears et.al in 2007 reported that tools were single largest group amongst papers , other categories were curricula, pedagogy and programming languages • Many of these System share common features and there exists systems which fulfills most assessment needs • New Automatic assessment system are being created every year
Semi-automatic vs. Automatic Assessment • Quality of the automatic feedback may not be as high as one given by an instructor • All issues related to good programming cannot be automatically assessed. • Hybrid approach uses Automation for small assignments and to combine manual and automation for larger assignments • [Advantages] gives teachers more time to concentrate on the demanding assessment tasks and also provides a possibility to double check the results of the automatic assessment.
Formative vs Summative Assessment • Formative Assessment : Allows Resubmission to help student improve the answer based on feedback.[Complete Program should be submitted in first attempt , except Web-cat] • Summative Assessment : [BOSS] can be used in homework assignments , online examinations
Conclusion • Benefits are numerous! • Immediate Feedback to students • 24h availability • Objectivity and Consistency of the evaluation • More Practice to students • Some features of a program can only be assessed with automatic assessment and some features cannot. Hybrid Approach may be useful. • Tool Specific Issues • Setting up configuration files may be time consuming. • Specification should be non-ambiguous • Effectiveness also depend Test Cases • If similar tool approaches are used, good assignments and their assessment routines could be stored and reused. • Tools should be made widely available !