460 likes | 489 Views
This workshop introduces the concept of Mega Software Engineering (MSE) and the EASE project, focusing on utilizing empirical data in software engineering. It covers the essential technologies, phases in empirical SE collection, and characteristics of MSE. EASE Project, funded by MEXT Japan, aims to apply the MSE concept in real projects to benefit organizations. The Empirical Project Monitor (EPM) is a partial implementation of the Empirical Environment, enabling data collection, measurement, and project control. The workshop explores data analysis, improvement policies, and the architecture of EPM, emphasizing its use of open-source tools and standardized data formats. EPM applications include project status sharing, risk reduction, and efficient project management across various project scales.
E N D
International Workshop on Community-Driven Evolution of Knowledge Artifacts UC Irvine, Dec. 16-18, 2003 Mega Software Engineering and EASE Project Katsuro Inoue Osaka University
Overview • Proposed a concept of Mega Software Engineering, which shares experiences and knowledge in community • Introduced EASE project based on the concept of MSE • Presented the overview of Empirical Environment and showed current implementation of Empirical Project Monitor EMP, as a partial realization of Empirical Environment • Predicted ongoing directions to deeper analyses of empirical data
Empirical Software Engineering • Various technologies in Software Engineering based on empirical data • Essential for scientific improvement of project processes and products
analysis improvement 3 Major Phases in Empirical SE collection
Classification of ESE Technologies by Target Scale Mega Software Engineering
Mega Software Engineering MSE • Targets many projects • A new concept but not a new technology itself • Collection of key technologies already existing and emerging • Distributed environment and data sharing • Analysis and data mining • Project monitoring and controlling • Scalable computing • ... • Use advances of hardware performance, e.g., network bandwidth, CPU clock, memory space, disk capacity, ... • Software engineering technology should share in advances of hardware, which is mainly used for multimedia, grid, simulator, ...
Characteristics of MSE • Experience and knowledge of individual developer or project are collected, refined as assets, and reused in community • Single-level flat static community for information sharing • Automatic process : Little burden is required for each developer or manager • View from the organizational benefits may be directly obtained (no individual developer’s view or project view) • Open source development is a simple case of MSE (MSE focuses analysis and feedback)
EASE Project • Empirical Approach to Software Engineering • Using the concept of MSE as its basis • Funded by MEXT (Japanese government, Ministry of Education, Culture, Sports, Science and Technology) • 5 year project starting 2003 Senri Lab.
Project Target Empirical software development environment from 1 to thousands of projects Empirical Environment
Project Objectives • Development of empirical environment • Application of empirical environment to real projects • Collection of data and expertise of empirical SE • Organizational benefits by applying empirical environment
Analysis Related Organization Software Development Organization Concept of Empirical Environment Internet Public Domain Software Open Source Project Collection Improvement
(1)Policy for Collection • Goal first (ideal cases) → Data collection first (Realistic approach) • Collect mainly product data(Obtain process data from product data) • Minimize developers overhead for collection • Raw data without human tampering • Real-time collection • Applicable to various projects • Small scale • Non-water fall process such as XP • Distributed development including sub-contracting
(2)Policy for Analysis Step-wise implementation difficult 5. … • 4.Reuse comp./ expertise • 3.Classification and evolution 2.Inter-project metrics simple 1.Process / product metrics inside single project
(3)Policy for Improvement • Feedback method for each objective • Various mechanisms for various cases Currently construct a browser for visualizing collected data and measured metrics
Empirical Project MonitorEPM • A partial implementation of Empirical Environment • Collect, measure, and show various data for project control • Data source • Versioning system CVS • Mailing list manager Mailman • Issue tracking tool GNATS
CVS, Mailman, GNATS, (WinCVS, CorporateSource) Architecture of EPM analysis tools developer manager measurement of intra and inter projects PostgreSQL(Repository) Standardized empirical SE data (in XML) developer manager prediction/ schedule metrics value other tool data etc. versioning history mail history problem history
Characteristics of EPM • Use open source development tools →Easy to introduce • Small overhead of data collection • Most data from versioning history • Communication through e-mail, and recoding issues by tracking tool • Easy to transform other data format to the standardized empirical SE data format
Application Area of EPM • Large project • Share project status immediately • Reduce project management load • Reduce risk for tampering data • Small project • Apply with small cost • Apply to various projects, including XP and distributed development
EPM Analysis Tool • Single activity view • Source code size • Issue resolution time • Cumulative number of issue, number of unsolved problems, ... • Multiple activity view • Check-in and check-out • #Issue and #mail • check-in and #issue
Growth of LOC • Progress monitoring • Schedule v.s. actual menu Project: EmpiriPrj LOC Cumulative LOC month
Growth of LOC(3 months) LOC Project: EmpiriPrj LOC Check-in occurred month
Growth of LOC Open source project nkf (character-code converter) LOC LOC Check-in occurred month
Cumulative Issues/Unsolved Issues /Mean Resolution Time cumulative issues Project:EmpiriPrj mean resolution days cumulative issues unsolved issues mean resolution days month
Check-in and Check-out # check-out Project:EmpiriPrj # check-out Check-in occurred month
Growth of Mail and Issues cumulative # mail Project:EmpiriPrj cumulative # mail check-in occurred issue raised issue resolved month
Cumulative Issues and Check-in cumulative # issues Project:EmpiriPrj cumulative issues check-in occurred month
Extending Analysis Features • Make deeper analysis and extract organizational expertise • Find and reuse expertise easily
EPM(developing) Code clone detection Component search Metrics measurement Project categorization Cooperative filtering Product data archive (CVS format) Process data archive (XML format) Format Translator Format Translator Format Translator Format Translator Versioning (CVS) Mailing (Mailman) Issue tracking (GNATS) Other tool data Managers Project x Project y Corporate Source GUI Project z . . . Developers
Example Scenario (1) Scheduled progress of project X 1 Actual progress of project X 2 Find projects similar to X - Project categorization - Collaborative filtering E C A W X Y V Q T P
Example Scenario (2) 3 Average reuse rate in similar projects Project X’s reuse rate - Code-clone detection Promote using software asset search engine to project X 4 - Software asset search engine
Expected Effect • Productivity can be drastically improved by reusing organizational assets • Management of assets can be easily performed • Cost control can be precisely made relative to previous similar projects • Reliability can be improved using issue history
Analysis Technology (1)Fast Code Clone Detection Code clones = similar portions of program
Analysis Technology (2)System Similarity Using Code-Clone Detecion
Analysis Technology (3)Collaborative Filtering Represen-tative Collaborative OutcomeAdopted Focused Q & MResources 9 9 9 7 7.5 (target) App. A 8 7 8 ? (missing) 8 App. B ? (missing) 8 8 8 7 App. C 7 6 ? (missing) 9 6 App. D
Analysis Technology(4) Java Class Search Engine SPARS-J
0.02 0.01 0.01 0.05 0.03 0.001 0.1 Markov Model • Component rank model can be considered as a Markov Chain of user's focus • User's focus moves from one component to another along a use relation at a fixed time duration • Node weight represents the existence probability of the user's focus at infinite future
Demo of SPARS-J http://demo.spars.info
Current Status and Schedule • Current - Demo version of EPM • First quarter of 2004 a release of EPM • First quarter of 2005 Application of EPM in industry • End of 2005 Inclusion of analysis tools • User group, consortium, interest group, ...
Summary • Proposed a concept of Mega Software Engineering, which shares experiences and knowledge in community • Introduced EASE project based on the concept of MSE • Presented the overview of Empirical Environment and showed current implementation of Empirical Project Monitor EMP, as a partial realization of Empirical Environment • Predicted ongoing directions to deeper analyses of empirical data