Effort Estimation Transfer Learning Study in Software Engineering

Transferlearningineffortestimation EkremKocaguneli Tim Menzies Emilia Mendes EmpiricalSoftwareEngineering, 2015,20(3) PresentedByTongShensi 2016.02.29

Introduction • RelatedWork • Methodology • Results • Conclusions

Introduction • Background • Data-minercanfindinterestingandusefulpatternsfromwithin-companydata • Transferresultacrossdifferentcompaniesischallenging • Manyorganizationsexpendmuchefforttocreaterepositoriesofsoftwareprojectdata(PROMISE,BUGZILIA,ISBSG)

Introduction • Background(cont.) • Arerepositoriesofsoftwareprojectdataarevaluabletoindustrialsoftwarecompanies? • Suchrepositoriesmaynotpredictpropertiesforfutureprojects • Theymaybeveryexpensivetobuild • Findingsindefeatpredictionareashowpromisingresult • Earlierfindingshowthattransferringdatacomewiththecostofreducedperformance • Filteringthetransferreddatamayaddresstheproblem

Introduction • Background(cont.) • Previousresultshowtransferringeffortestimationresultisachallengingtask • Kitchenham et al.reviewed7publishedtransferstudies,inmostcases, transferreddataisworse • Ye et al.reportedthatCOCOMOmodelhavechangedradicallyfornewdatacollectedin2000-2009

Introduction • Research Question • Istransferlearningeffectiveforeffortestimation? • Howusefularemanualdivisionofthedata? • Doestransferlearningforeffortestimationworkacrosstimeaswellasspace? • ThispaperusesTEAKaslaboratoryforstudyingtransferlearningineffortestimation

Related Work • Transferlearning(TL) • SourcedomainDS,SourceTaskTS,TargetdomainDT,TargetTaskTT • TLtriestoimproveanestimationmethodinDTusingtheknowledgeofDSandTS • DS≠DT,TS≠TT

Related Work • TransferlearningandSE • ThepriorresultperformanceofTLareunstable • 10studiesreviewedbyKitchehametal.,4studiesfavoredwithindata,another4studiesfoundthattransferringdataisnotstatisticallysignificantlyworsethanwithindata,2studieshadinconclusiveresults • Zimmermannetal.foundwithinperformedbetterin618cases(total622cases)indefeatprediction • Turhan et al.compareddefectpredictorslearnedfromtransferredorwithindata,foundtransferredpredictorshavepoorperformance.Butafterinstanceselection,theyarenearlythesame

Methodology • Dataset • Tukututudatabase • 195projectsfrom51companies • Eliminatedallthecompanieswithlessthan5projects,125projectsfrom8companies

Methodology • Dataset(cont.) • Cocomo81 • Coc-60-75 • Coc-76-rest • Nasa93 • Nasa-70-79 • Nasa-80-rest

Methodology • PerformanceMeasures • Meanabsoluteerror(MAE) • MeanMagnitudeofRelativeError(MMRE) • MMRE=mean(allMREi) • MedianMagnitudeofRelativeError(MdMRE) • MdMRE=median(allMREi) • Pred(25)

Methodology • PerformanceMeasures(cont.) • Mean MagnitudeofErrorRelative(MMER) • MeanBalancedRelativeError(MBRE) • MeanInvertedBalancedRelativeError(MIBRE) • StandardizedAccuracy(SA)

Methodology • InstanceSelectionandRetrieval • Analogy-basedestimation(ABE) • Inputadatabaseofpastprojects • Foreachtestinstance, retrieve k similar projects • For choosing k analogies use a similarity measure • Before calculating similarity, scale independent features to 0-1 interval so thathigher numbers do not dominate the similarity measure. • Use a feature weighting scheme to reduce the effect of less informativefeatures. • Adapt the effort values of the k nearest analogies to come up with the effort estimate.

Methodology • InstanceSelectionandRetrieval(cont.) • TEAK • TEAK is a variance-based instance selector that discards training data associated with regions of high dependent variable (effort) variance

Methodology • Experimentation • Goal • Getthepercentageofeverysubsetwouldberetrievedintokanalogiesusedforestimation • AnswerwhetherTLcanenablethe use of datafrom other organizations as well as from other time intervals

Result • TransferinSpace • Tuku1,4-7’stieare veryhigh • Tuku8,performance dependontheerror Measure

Result • TransferinTime

Result • Inspecting Selection Tendencies

Result • Inspecting Selection Tendencies • Finding1 • Onlyaverysmallportionofalltheavailabledataistransferredasusefulanalogies • Finding2 • Whenwecomparethediagonalandoff-diagonalpercentages,weseethatthevaluesareveryclose

Conclusion • Whenprojectslacksufficientlocaldatatomakepredictions,theycantrytotransferinformationfromotherprojects • Researchquestion • RQ1:Is transfer learning effective for effort estimation? • Inmajorityofthecases,transferredresultsaswellaswithinresults

Conclusion • Researchquestion(cont.) • RQ2:How useful are manual divisions of the data? • Testinstancesselectequalamountsoinstancesfrom within and transferred data sources • Teakfoundnoaddedvaluein restricting reasoning to just within a delphi localization • RQ3:Does transfer learning for effort estimation work across time as well as space? • Yes • Itmaybemisguidedtothink • Thedataofanotherorganizationcannotbeused • Olddataofanorganizationisirrelevant

Conclusion • Thoughts • Heterogeneousdatafortransferlearningineffortestimationischallengingandmeaningful • Sometimewecoulduseothersmethodtovalidateourthoughts

Q&A

Thank you

Effort Estimation Transfer Learning Study in Software Engineering

Effort Estimation Transfer Learning Study in Software Engineering

Presentation Transcript

issues in transfer learning

Effort Estimation

Transfer to Learning

Effort Reporting System Cost Transfer Demo

Software Effort Estimation

Transfer Learning

Relational Transfer in Reinforcement Learning

Class Answers to Effort “Estimation” (Fall 2010)

The deviance problem in effort estimation

Learning Transfer in Girls Volleyball

Generality and Transfer in Learning

Effort Estimation

EFFORT ESTIMATION RISK MANAGEMENT

COLLABORATIVE EFFORT IN PROMOTING LIFELONG LEARNING:

Effort Estimation

Effort Estimation

Estimation of Defects and Effort

Effort Estimation

Effort Estimation Based on Collaborative Filtering

Effort Reporting System Cost Transfer Demo

Realism in Assessment of Effort Estimation Uncertainty :

Learning: Parameter Estimation