310 likes | 597 Views
Mining Software Repositories What to do? And where to get data?. Israel Herraiz < herraiz@uax.es > Universidad Alfonso X el Sabio June 18 th 2010. Outline. What is Mining Software Repositories? What are repositories? Conferences and journals of interest
E N D
Mining Software RepositoriesWhat to do? And where to get data? Israel Herraiz <herraiz@uax.es> Universidad Alfonso X el Sabio June 18th 2010
Outline What is Mining Software Repositories? What are repositories? Conferences and journals of interest And some words about trending topics Tools for Mining Software Repositories Datasets for Mining Software Repositories For replicable and verifiable empirical studies
What is Mining Software Repositories? MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects. Popular topic since 2004 MSR workshop, colocated with ICSE Working Conference since 2008
What are repositories? Anything that leaves a trail about any software development or maintenance activities Also includes any software artifact Tipically Version control systems Bug tracking systems Public communication tools (mailing lists)
Differences between artifact and repository hello.c hello.c.diff #include <stdio.h> int main() { printf(“Hello world”); return 0; } - printf(“Hello world”); + printf(“Hello world\n”); Author: rms Date: 20100618 04:34 UTC Change: +1 -1 Log: Forgot to add new line Repository Change to an artifact Meta-information Artifact Source code file
Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) IEEE Int. Working Conf. Mining Software Repositories (MSR) Deadlines Accept rate Journal possib. January (Februray for the challenge) 19% (2008) 31% (2010) EMSE IEEE TSE http://msr.uwaterloo.ca 26% (2007) 38% (2008) 45% (2009) JSS SCP April http://www.ieee-scam.org
Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) Empirical Software Eng. & Measurement (EMSE) IEEE Int. Conf. Software Maintenance (ICSM) Deadlines Accept rate Journal possib. 21% (2007) 26% (2008) 22% (2009) No special issues April http://icsm2010.upt.ro/ 15% (2008) 12% (2009) 14% (2010) No special issues August September http://www.sbs.co.za/ICSE2010/ March ? EMSE http://www.esem-conferences.org/
Other interesting conferences Working Conference on Reverse Engineering (WCRE) http://web.soccerlab.polymtl.ca/wcre2010/ International Conference on Predictive Models and Software Engineering (PROMISE) http://promisedata.org/ European Conference on Software Mainteance and Re-engineering (CSMR) http://www.sait.escet.urjc.es/csmr2010/
Journals of interest IEEE Transactions on Software Engineering (TSE) http://www.computer.org/tse/ ACM Transactions on Software Engineering and Methodology (TOSEM) http://tosem.acm.org/ Empirical Software Engineering (EMSE) http://www.springerlink.com/content/1382-3256 Journal of Systems and Software (JSS) http://www.elsevier.com/locate/jss Journal of Software Maintenance and Evolution (JSME) http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html
Handy links Software Engineering Conferences Verification, Formal Methods, Programming Lang. and Compilers, Web, Security http://people.engr.ncsu.edu/txie/seconferences.htm Upcoming Software Engineering Conferences Map http://research.csc.ncsu.edu/ase/semap/
Trending topics Replication of empirical studies The replication package Recommendation systems Automated Software Engineering
Tools for Mining Software Repositories Mining tools Libresoft Tools http://tools.libresoft.es/ CVSAnaly – CVS/SVN/Git repositories log parser MLStats – Mailman and Mboxes parser Bicho – Bugzilla and SF.net tracker parser Software Architecture Group (SWAG) – University of Waterloo http://www.swag.uwaterloo.ca/tools.html
MSR Mining Challenge Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipse http://msr.uwaterloo.ca/msr2008/challenge/ Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projects http://msr.uwaterloo.ca/msr2009/challenge/
Ultimate Debian Database Database with information about packages and bug reports of Debian and Ubuntu http://udd.debian.org/
Eclipse bug database Saarland University Datasheets, databases, scripts, with information about Eclipse bug reports for several releases http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/
FLOSSMetrics Databases about ~5000 open source projects Control version repositories, mailing list archives, bug tracking databases MySQL dumps Not very user friendly Obtained using the Libresoft Tools http://www.flossmetrics.org/
FLOSSMole Database with information about all the SourceForge.net projects ~150,000 projects Mainly metainformation, obtained through parsing the web pages of the projects No low level or fine grained information http://flossmole.org
PROMISE repository All PROMISE papers must also submit a package with the data used in the paper http://promisedata.org/ 101 datasets Defect prediction (58) Effort prediction (18) General (9) Model-based SE (7) Text mining (9)
Defect repository Firefox Defect Repository http://bugzilla.mozilla.org/ Eclipse Defect Repository https://bugs.eclipse.org/bugs/