320 likes | 333 Views
Mining Novice Programmer Errors Emily S. Tabanao MS Computer Science Ateneo de Manila University. Problem: Poor programming comprehension. first-year computer science students lack programming comprehension failing rate in an introduction to programming class in Australia is as high 35%
E N D
Mining Novice Programmer ErrorsEmily S. TabanaoMS Computer ScienceAteneo de Manila University
Problem: Poor programming comprehension • first-year computer science students lack programming comprehension • failing rate in an introduction to programming class in Australia is as high 35% • 30% of computer science students in the United Kingdom and the United States did not understand programming basics after their first programming class • students have a fragile grasp of programming and were unable to read, analyze, and trace through short fragments of code
As a response to this problem • Research is conducted to: • know the characteristics of novice programmers • Causes of their problems • Find possible solutions
The difficulties of programming may be caused by: • lack of a mental model • misconception of programming constructs • lack of programming strategies • lack or absence of debugging strategies
Factors affecting performance of novices: • Prior to entering CS1 • Gender • secondary school performance • dislike of programming • intrinsic motivation and comfort level • high school mathematics background • prior programming experience • attribution to luck for success/failure, and • perceived understanding of the material
Factors affecting performance of novices: • Behaviors that have positive effect on performance: • perfectionism and self-esteem, and • high states of arousal or delight • Behaviors that have negative effect on performance: • disliking programming • frustration • Confusion • boredom and • IDE-related on-task conversation
Goal of the Study • Determine whether analysis of online protocols can successfully identify/predict at-risk novice Java programmers
Online protocols • sequence of program compilations while performing laboratory exercises • Are gathered by enhancing development environments used in programming to store data in a database
Research Questions • How do students with different achievement levels differ in terms of • Error profiles? • Average time between compilation profiles? • EQ profiles? • What factors can predict the midterm score?
Methodology • Participants • 143 Introduction to Computing students • Tools for Data Collection • BlueJ • WebServer • Sqlite Database • LAN
Methodology • Procedure • Laboratory Setup • Orientation • Data Gathering • Data Analysis • Data Cleaning • Data Extraction
Methodology • Data Analysis • Generate summaries • Errors encountered • Time between compilations • Compute EQ score • Use statistical tool R Stat • Perform one-way Anova to differentiate student groups • correlate EQ score with midterm exam score • Use datamining tool (Rapidminer and Weka) for creating linear regression models
Error Quotient (EQ) • Developed by Matthew Jadud • Quantifies students’ compilation behavior • Characterizes how much or little a student struggles with syntax errors • EQ score ranges from 0.0 to 1.0, where a 1.0 is an indication that a student encountered the same error all throughout the compilations
The EQ algorithm Do both events end in errors? Same error type? Y Add 2 Y Start N N Add 2 Add 2 Y Same edit location? Same error location? N N Y End Add 3
Results: Midterm Score • Lowest score=38 • Highest score=96 • Mean=75, Standard Deviation=13 • Student Grouping: • AtRisk – scores 62 and below • HighPerforming-scores 89 and above • Average= scores 63 to 88
1a.How do students with different achievement levels differ in terms of Error profiles?
Using one-way Anova on Total Errors vs Groups • HighPerforming group was significantly different from the AtRisk and Average groups at p < .001 and have lower number of errors encountered compared to the two • Average group is not significantly different from the AtRisk group
1b. How do students with different achievement levels differ in terms of average time between compilations Profiles?
Using one-way Anova on Average Time Between Compilations vs Groups • HighPerforming group was significantly different from the AtRisk and Average groups and they have higher average time between compilations compared to the two groups • There was no significant difference between the Average and AtRisk groups
Using one-way Anova on the Time Between Compilation per 10 sec bins vs Groups • the HighPerforming group was significantly different from the Average and AtRisk groups except on the time intervals • 21-30, 111-120 and >120 seconds for the Average group • 81-90 seconds for the AtRisk group • the HighPerforming group have lower number of compilations • there was no significant difference between the Average and AtRisk group in all time intervals
1c. How do students with different achievement levels differ in terms of EQ Profiles?
2. What factors can predict the midterm score? • Linear Regression was performed to come up with models-regression line in the formY = aX + b • Two questions to ask about the model: • Does the model fit the observed data well? • Compute correlation coefficient r, a measure of the relation between X and Y • look at the scatterplot • Compute R2 – the square of the correlation coefficient r, measures the strength of the relationship between X and Y • Compute BiC’-Bayesian Information Criterion • Can the model generalize to other samples? • Can the model predict the same outcome from the same set of predictors in a different sample? • Adjusted R2 – indicates the loss of predictive power of the model
2a. Predicting the midterm score using the Total errors encountered Model 1: MidtermScore = 83.63049 - 0.0919*TotalErrors p-value < .001, BiC’ = -7.8, Adjusted R2=0.161
2a. Predicting the midterm score using the Top Ten errors encountered Model 2: MidtermScore = 83.50274 - 0.25632*UNKNOWN_VARIABLE - 0.42035*CLASS_INTERFACE_EXP - 0.75506*UNKNOWN_CLASS p-value < .001, r = BiC’ = -10.2635, Adjusted R2= 0.1994,
2b. Predicting the midterm score using Average Time between compilations Model 3: MidtermScore = 65.04788 + 0.12107*AverageTBC_seconds p-value < .01, BIC = -1.97243, Adjusted R2 = 0.06512
2b. Predicting the midterm score using Average Time between compilations in 10 sec bins Model 4: MidtermScore = 87.4381 - 2.0042*Twenty + 6.4780*Ninety + 7.4892*Hundred p-value < .01, BIC = -7.01032, Adjusted R2 = 0.1263
c. Predicting the midterm score using EQ scores Model 5: MidtermScore = 92.918 - 64.396*EQ p-value < .001, BIC = -17.3303 Adjusted R2 = 0.2971,
Combining all features in Models 1 to 5: Model 6: MidtermScore = 90.58643 - 43.33380*EQ p-value < .001, BIC = -20.8326, Adjusted R2 = 0.3073
Conclusions and Future Work • We found: • Students encounter similar error types • Total Errors Encountered • HighPerforming < Average <= AtRisk • Three out of the top 10 errors may affect the midterm scores of the Average and AtRisk students • Average Time between compilations among HighPerforming students are higher compared to the Average and AtRisk students • EQ among HighPerforming students are lower compared to the Average and AtRisk students
Conclusions and Future Work • Linear Models • Informs which errors directly affects the midterm score which implicitly points to the concepts that AtRisk students need assistance • High incidence of rapid fire compiling maybe a symptom of AtRisk students • EQ can significantly predict Midterm Scores
Conclusions and Future Work • Use the models to automatically detect AtRisk students while using an IDE • Implications on teaching: to address concepts that help students resolve the errors that directly affects performance
Thank you... Questions?