160 likes | 317 Views
Thematic Analysis of Group Software Project Change Logs. Andy Burn. Overview. Software Engineering Group Project Change Logs Thematic Analysis Results. Software Engineering Group Project. SEG Durham and Newcastle collaboration Team-based software development project
E N D
Thematic Analysis of Group Software Project Change Logs Andy Burn
Overview • Software Engineering Group Project • Change Logs • Thematic Analysis • Results
Software Engineering Group Project SEG • Durham and Newcastle collaboration • Team-based software development project • Requirements, design, implementation, delivery • Focus here is on implementation 2006/07 • 12 Groups • 62 students contributed code to the implementation • 36 students from Durham, 26 from Newcastle
Change Logs (1) • Students use SubVersion for code management • One message per revision • (Revision: a version of the code being stored) • Examples • “Added a splash screen” • “Fixed a bug” • “Bah” • “asdasd” • “Who reads these?” • And the most popular: • “ ”
Change Logs (2) • 2006/07 Projects • ~4,100 revisions (2,600 Durham, 1,500 Newcastle) • ~1,000 revisions with messages • 800 Durham, 200 Newcastle • Why use comments? • In theory, provides a complete and informative history of a project • In practice, it works • In SEG, it doesn’t • Can they tell us anything else?
Thematic Analysis (1) • Unlike purely numerical analysis, thematic analysis aims to uncover patterns or stories in data • In CS terms, each data item is ‘tagged’ (coded), and patterns are found in the groups formed from the tags. • Carried out on the comments (all 1,035 of them…) • The codes used were loosely based on maintenance activities
Thematic Analysis (2) • After a long process of experimenting and verifying code schemes (thanks Stephen) the following codes were used: • Developmental: Creation or modification of features or functionality • Perfective: Refactoring, testing, commenting, cleaning of code • Corrective: Bug fixing • Ambiguous: Fits – or may fit - more than activity type • Misc: Irrelevant, or is out of scope (e.g. documentation)
Thematic Analysis (3) • Research Questions • How are activity types distributed? • Does this change over time? • Is this affected by gender or campus?
Limitations • Only one year’s data • Data may not be representative • Limited to 25% of the total data • Students don’t always submit work under their own names • Visiting other campuses • Pair programming • Metrics used are effective on average • Too much ambiguity
How Are Activity Types Distributed? • Too much emphasis on development • Too little testing, fixing and improving • “Misc” and “Ambiguous” should be minimized
Does Gender Affect Activity Types? • Could not address this question • Too few commented revisions from women • More data is needed, even with 2007/08 data included
Conclusions • Students do not have a very good work distribution • Too much emphasis on developing new features (the fun part) • This is expected - SEG is designed to teach students these lessons before they reach industry • There is no significant effects from the different universities • Too early to tell if gender is a factor
Future Work • Analysis of 2007/08 project data • Analysis of open source projects • Analysis of other university projects • Attempt to overcome limitations of the data • Better codes • Better metrics • Improved use of SubVersion by the students