260 likes | 399 Views
Process Mining Software Repositories. Master project kickoff presentation Wouter Poncin , w.poncin@student.tue.nl. Agenda. Introduction Existing approaches Project goal Prototype Design Current work. Introduction. Software development teams Software repositories Analysis.
E N D
Process Mining Software Repositories Master project kickoff presentation WouterPoncin, w.poncin@student.tue.nl
Agenda Introduction Existing approaches Project goal Prototype Design Current work / Department of Mathematics and Computer Science
Introduction Software development teams Software repositories Analysis / Department of Mathematics and Computer Science
Existing approaches NavTracks [Sin05] eROSE [Zim05] DynaMine [Liv05] MarmoSet [Spa05] projectWatcher [Gut04] Traceability links [Kag07] Improve bug finding [Wil05] Predict change [Yin04] / Department of Mathematics and Computer Science
Existing approaches – multiple data sources Images from: http://www.cs.ubc.ca/labs/spl/projects/hipikat/ Hipikat: recommends relevant software artifacts based on the current context of a developer [Čub05] / Department of Mathematics and Computer Science
Existing approaches – multiple data sources Images from: http://www.sqo-oss.org/ Alitheia Core: a platform for software engineering research [Gou09] / Department of Mathematics and Computer Science
Existing approaches – multiple data sources • Other approaches: • Wolf et al. [Wol09]:Mining task-based social networks to explore collaboration in software teams. • Bird et al. [Bir06]:Mining email social networks • Robles et al. [Rob05]:Developer identification methods for integrated data from various sources / Department of Mathematics and Computer Science
Existing approaches – problems • Mostly single data source • Problems with multiple data source approaches: • Provide artifact centered view (Hipikat) • Focus on metric calculation (Alitheia Core) • No analysis on global process overview • Example analysis questions: • How does the real (mined) organizational model relate to the ‘used’ organizational model? • How to classify developers of open source projects? [Nak02] • Does the project follow a given development process model? (waterfall / XP / …) / Department of Mathematics and Computer Science
Existing approaches – problems Mostly single data source No analysis on global process overview Solution: process mining / Department of Mathematics and Computer Science
Intermezzo: process mining Image from: http://prom.win.tue.nl/research/wiki/_detail/research/processmining.gif / Department of Mathematics and Computer Science
Intermezzo: process mining Example from: [Med09] Input: event log Output: models / Department of Mathematics and Computer Science
Project goal The goal of this project is to develop an application which facilitates process analysis of data from various software repositories, in an easy manner. Facilitate export data to log Various repositories combine data Various repositories later add new types of data Easy manner add a data source by URL Open source & closed source projects / Department of Mathematics and Computer Science
Prototype Console application Input: repository url’s Output: MXML process log Analysis: ProM Simple developer matching High level events Case: originator / Department of Mathematics and Computer Science
Prototype • Project: Gallery(web based photo gallery software)http://sourceforge.net/projects/gallery/ • Used data sources: • SVN repository (20740 revisions) • TRAC tickets (1028) • Mailing list archives: ‘devel’ (2867 messages), ‘translate’ (108 messages),‘announce’ (69 messages) / Department of Mathematics and Computer Science
Prototype – analysis / Department of Mathematics and Computer Science
Prototype – analysis Legend: - yellow: TRAC ticket - white: SVN revision - red: Mail (translations) - blue: Mail (devel) - green: Mail (announce) / Department of Mathematics and Computer Science
Prototype – analysis Legend: - yellow: TRAC ticket - white: SVN revision - red: Mail (translations) - blue: Mail (devel) - green: Mail (announce) / Department of Mathematics and Computer Science
Prototype – analysis / Department of Mathematics and Computer Science
Design • Application requirements: • Support multiple data sources (software repositories) • Caching of data from data sources • Define data filters • Developer matching • Define mapping from data elements to log elements • Easy addition of new plugins for data source types / export types / Department of Mathematics and Computer Science
Design • Issues • How to define a case • Level of granularity of events • How to define developer matching (manual/automatic) / Department of Mathematics and Computer Science
Design • Data sources to support: • Subversion • CVS • Git(used for jQuery / mootools for example) • Bugzilla • TRAC • Wiki articles (+history) • SourceForgemailinglists • SourceForge thumbs up/down • Twitter / Department of Mathematics and Computer Science
Design • Analysis tools: • ProM: www.processmining.org (open source) • Futura Reflect: www.futuratech.nl • Interstage Business Process Manager • Fluxicon: www.fluxicon.com • And others… / Department of Mathematics and Computer Science
Current work • Finish application development • Developer matching • Case definition • Internal cache • Implement data source plugins • Analyze projects • (Large) open source projects • Like Firefox, WordPress, Filezilla for example • SEP / student projects / Department of Mathematics and Computer Science
Questions ? / Department of Mathematics and Computer Science
References [Bir06] Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A. Mining email social networks. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 137–143, New York, NY, USA, (2006). ACM. [Čub05] Cubranic, D., Murphy, G.C., Singer, J., Booth, K.S. Hipikat: A project memory for software development. IEEE Trans. Softw. Eng., 31(6):446–465, (2005). [Gou09] Gousios, G., Spinellis, D. Alitheia core: An extensible software quality monitoring platform. Software Engineering, International Conference on, pages 579–582, (2009). [Gut04] Gutwin, C., Penner, R., Schneider, K. Group awareness in distributed software development. In CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81, New York, NY, USA, (2004). [Kag07] Kagdi, H., Maletic, J.I., Sharif, B. Mining software repositories for traceability links. In ICPC '07: Proceedings of the 15th IEEE International Conference on Program Comprehension, pages 145–154, Washington, DC, USA, (2007). IEEE Computer Society. [Liv05] Livshits, B., Zimmermann, T. DynaMine: nding common error patterns by mining software revision histories. In ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296–305, New York, NY, USA, (2005). ACM. [Med09] Medeiros, A.K.A. de, Aalst, W.M.P. van der. Process mining towards semantics. pages 35–80, (2009). [Moc00] Mockus, A., Fielding, R.T., Herbsleb, J. A case study of open source software development: the apache server. In ICSE '00: Proceedings of the 22nd international conference on Software engineering, pages 263–272, New York, NY, USA. ACM. / Department of Mathematics and Computer Science
References [Nak02] Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y. Evolution patterns of open-source software systems and communities. In IWPSE '02: Proceedings of the International Workshop on Principles of Software Evolution, pages 76–85, New York, NY, USA, (2002). ACM. [Rob05] Robles, G., Gonzalez-Barahona, J.M. Developer identication methods for integrated data from various sources. In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, pages 1–5, New York, NY, USA, (2005). ACM. [Sin05] Singer, J., Elves, R., Storey, M. Navtracks: Supporting navigation in software maintenance. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance, pages 325–334, Washington, DC, USA, (2005). IEEE Computer Society. [Spa05] Spacco, J., Strecker, J., Hovemeyer, D., Pugh, W. Software repository mining with marmoset: an automated programming project snapshot and testing system. SIGSOFT Softw. Eng. Notes, 30(4):1–5, (2005). [Wil05] Williams, C.C., Hollingsworth, J.K. Automatic mining of source code repositories to improve bug finding techniques. Software Engineering, IEEE Transactions on, 31(6):466–480, June 2005. [Wol09] Wolf, T., Schröter, A., Damian, D., Panjer, L.D., Nguyen, T.H.D. Mining task-based social networks to explore collaboration in software teams. IEEE Softw., 26(1):58–66, (2009). [Yin04] Ying, A.T.T., Murphy, G.C., Ng, R., Chu-Carroll, M.C. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9), (2004). [Zim05] Zimmermann, T., Dallmeier, V., Halachev, K., Zeller, A. eROSE: guiding programmers in eclipse. In OOPSLA '05: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 186–187, New York, NY, USA, (2005). ACM. / Department of Mathematics and Computer Science