280 likes | 288 Views
This research aims to find an alternative phase effort distribution guideline for the current COCOMO II model by studying domain-based cost estimation models and analyzing different project domains.
E N D
Domain-Based Phase Effort Distribution AnalysisAnnual Research Review Thomas Tan March 2012
Table of Contents • Introduction • Research Approach • Current Results • Summary and Next Steps 2
Introduction • This research is aimed at finding alternative phase effort distribution guideline for the current COCOMO II model. • Current COCOMO II model use an one-size-fit-all guideline which may lead to unrealistic schedule. • Studying many mainstream cost estimation models, it seems domain information can be a good candidate: • Available early. • Easier to tell by all stakeholders. • So, if we are to use domain information to define an alternative guideline, then, we must prove the following: • Projects from different domains have different phase effort distribution patterns. • Projects from different domains does not always follow the given COCOMO II model’s effort distribution guideline. 3
Table of Contents • Introduction • Research Approach • Current Results • Summary and Next Steps 4
Research Approach • Establish domain breakdown definitions. • Two possible breakdowns: • Conventional Application Domains using application types, such as communication, simulation, sensor control and process, etc. • Innovative way using productivity rates to group traditional domains, resulting Productivity Types. • The following slide will provide mapping between the two breakdowns. • Select and normalize subject data set. • Analyze effort distribution patterns and prove that differences between domains exist. • Calculate effort distribution percentages and find distribution patterns. • Use ANOVA and T-Test for proof. • Study personnel ratings and system size to observe additional effects. • This analysis is performed on data set that is categorized by either domain breakdowns, namely, Application Domains and Productivity Types. 5
Domain Breakdowns Mapping between Application Domains and Productivity Types 6
Select and Normalize Data • Project data (about 530 records) is extracted from the standard Software Research Data Report (SRDR). • Normalization includes: • Evaluate and eliminate records with missing important fields or weird patterns. • Backfill those with missing limited phase effort data. • Calculate system size, personnel ratings, and other necessary parameters. • Data processing results: • After evaluation and elimination, we have 345 total records. • Within these 345 total records: • 135 are “perfect” records (all effort fields are filled with original values). • 257 are “missing 2” records which we backfilled 2 of the 5 phase effort data. 7
Calculate Average Percentages Calculate percentages for each records Calculate average percentages for each domain 8
ANOVA Test • Test 1: show if there is difference between application domains in term of effort distribution percentages. • Test 1: Use simple ANOVA to test the following: • H0: effort distributions are same. • Ha: effort distributions are not all the same between domains. • Test input is the list of effort percentages grouped by application domains. • Test uses 90% confidence level to determine the significance of the results 9
T-Test • Test 2: show if there is difference between application domains averages and the COCOMO II effort distribution averages. • Test 2: Use independent one-sample t-test to test the following: • H0: domain average is the same as COCOMO average. • Ha: domain average is not the same as COCOMO average. • Tests run for every domain on every activity group. • Use the following formula to calculate T value in order to determine the result of the t-test: • where s is the standard deviation, n is the sample size, and µ0 is the COCOMO average we used to compare against. • Also uses 90% confidence level to determine the significance of the results. 10
Study on Personnel Ratings and Size • Goal: to find if changes in personnel ratings and/or size will generate changes in phase effort distribution patterns. • Procedure: • Each record is supplied with personnel ratings and size. • Note: personnel rating is calculated as the following: • Simple plot of these values vs. effort percentages for each activity group for each domain/productivity type. • Observe trends from the plot, using statistical analysis if necessary. • Results of the study may indicate that we can use more precise distribution pattern when size or personnel rating is given. 11
Table of Contents • Introduction • Research Approach • Current Results • Summary and Next Steps 12
Data Processing Results • Records by application domains: 13
Application Domains ANOVA Results ANOVA Test: Reject means that for the given activity group (development phase), there is significant differences between the domains analyzed. 16
Application Domains T-Test Results T-Test: Reject means that the given domain has significantly different average percentages from the COCOMO II model. NOTE: Because the sum of all COCOMO II averages are 107%, we have divided them by 1.07 to make sure all measurements are at the same level for comparison purpose. 17
Data Processing Results • Records by productivity types: 18
Productivity Types ANOVA Results ANOVA Test: Reject means that for the given activity group (development phase), there is significant differences between the productivity types analyzed. 21
Productivity Types T-Test Results T-Test: Reject means that the given domain has significantly different average percentages from the COCOMO II model. NOTE: Because the sum of all COCOMO II averages are 107%, we have divided them by 1.07 to make sure all measurements are at the same level for comparison purpose. 22
Results on Personnel Ratings and Size • For both Application Domains and Productivity Types: • Personnel ratings and system size are ineffectively to indicate any changes of effort distribution patterns in most cases. • Few cases observed distinguishable trends but later proved statistically insignificant. • Grouping sizes is an alternative for analyzing size, however, it is extremely difficult to apply on all domains or productivity types: • No appropriate way to divide group sizes to fit all domains or productivity types. • Conclusion: • Personnel ratings can be dropped from the effort distribution pattern analysis. • Can spend a little more time playing with size groups, but initial results favors dropping size as well. 23
Table of Contents • Introduction • Research Approach • Current Results • Summary and Next Steps 24
Summary and Next Step • Established research goal and plan. • Defined domain breakdowns. • Normalized the subject data collection: prepared all the needed data sets for analysis. • Finished the analyzing effort distribution patterns for both Application Domains and Productivity Types: • Both breakdowns show proven differences in effort distribution patterns. • Both show proven differences against COCOMO II model’s average percentages. • None of them show significant trends adding in personnel ratings or size. • The next major thing is to determine which breakdown is better to provide an alternative effort distribution guideline for the COCOMO II model. 25
For more information, contact: Thomas Tan thomast@usc.edu 626-617-1128 Questions? 26
References (1/2) • Blom, G. Statistical estimates and transformed beta variables. John Wiley and Sons. New York. 1958. • Boehm, B., et al. Software Cost Estimation with COCOMO II. Prentice Hall, NY. 2000. • Boehm, B. Software Engineering Economics. Prentice Hall, New Jersey. 1981. • Borysowich, C. “Observations from a Tech Architect: Enterprise Implementation Issues & Solutions – Effort Distribution Across the Software Lifecycle”. Enterprise Architecture and EAI Blog. http://it.toolbox.com/blogs/enterprise-solutions/effort-distribution-across-the-software-lifecycle-6304. October 2005. • Defense Cost and Resource Center. “The DoD Software Resource Data Report – An Update.” Practical Software Measurement (PSM) Users’ Group Conference Proceedings. July 2005. • Department of Defense Handbook. “Work Breakdown Structure for Defense Material Items: MIL-HDBK-881A.” July 30, 2005. • Digital Equipment. VAX PWS Software Source Book. Digital Equipment Corp., Maynard, Mass., 1991. • Heijstek, W., Chaudron, M.R.V. “Evaluating RUP Software Development Process Through Visualization of Effort Distribution”. EUROMICRO Conference Software Engineering and Advanced Application Proceedings. 2008. Page 266. • IBM Corporation. Industry Applications and Abstracts. IBM. White Plains, N.Y., 1988. • Kruchten, P. The Rational Unified Process: An Introduction. Addison-Wesley Longman Publishing Co., Inc. Boston. 2003. • Kultur, Y., Kocaguneli, E., Bener, A.B. “Domain Specific Phase By Phase Effort Estimation in Software Projects”. International Symposium on Computer and Information Sciences. September 2009. Page 498. • McConnell, S. Software Estimation Demystifying the Black Art, Microsoft Press, 2006, page 62. • Milicic, D., Wholin, C. “Distribution patterns of Effort Estimation”. EUROMICRO Conference Proceedings. September 2004. Page 422. • Norden, P.V. “Curve Fitting for a Model of Applied Research and Development Scheduling”. IBM J. Research and Development. 1958. Vol. 3, No. 2, Page 232-248. • North American Industry Classification System, http://www.census.gov/eos/www/naics/, 2007. • O'Connor, J. Robertson, E. "Student's t-test", MacTutor History of Mathematics archive, University of St Andrews, http://www-history.mcs.st-andrews.ac.uk/Biographies/Gosset.html. 27
References (2/2) • Pearson, K. "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling". Philosophical Magazine, Series 5 50 (302), 1901. Page 157–175. • Putnam, L.H. “A Macro-Estimating Methodology for Software Development”. IEEE COMPCON 76 Proceedings. September 1976. Page 138-143. • Putnam, L. and Myers. W. Measures for Excellence. Yourdon Press Computing Series. 1992. • Reifer Consultants. Software Productivity and Quality Survey Report. El Segundo, Calif., 1990. • SEER-SEM. http://www.galorath.com. • Shapiro, S. S.; Wilk, M. B. “An analysis of variance test for normality (complete samples).” Biometrika 52 (3-4), 1965: page 591–611. • Stephens, M. A. "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association. Vol. 69, No. 347 (Sep., 1974). Page 730-737. • Tan, T. Clark, B. “Technical Report of a New Taxonomy for Software Application Domains and Operating Environments.” USC CSSE Technical Reports. 2011. • Upton, G., Cook, I. Understanding Statistics. Oxford University Press. Page 55. 1996. • US Air Force. Software Develoopment Cost Estimating Handbook, Software Technology Support Center, Vol 1, Sep 2008. • Yang, Y., et al. “Phase Distribution of Software Development Effort”. Empirical Software Engineering and Measurement. October 2008. Page 61. 28