280 likes | 299 Views
Transfer learning in effort estimation. Ekrem Kocaguneli Tim Menzies Emilia Mendes Empirical Software Engineering, 2015,20(3) Presented By Tong Shensi 2 016.02.29. Introduction Related Work Methodology Results Conclusions. Introduction. Background
E N D
Transferlearningineffortestimation EkremKocaguneli Tim Menzies Emilia Mendes EmpiricalSoftwareEngineering, 2015,20(3) PresentedByTongShensi 2016.02.29
Introduction • RelatedWork • Methodology • Results • Conclusions
Introduction • Background • Data-minercanfindinterestingandusefulpatternsfromwithin-companydata • Transferresultacrossdifferentcompaniesischallenging • Manyorganizationsexpendmuchefforttocreaterepositoriesofsoftwareprojectdata(PROMISE,BUGZILIA,ISBSG)
Introduction • Background(cont.) • Arerepositoriesofsoftwareprojectdataarevaluabletoindustrialsoftwarecompanies? • Suchrepositoriesmaynotpredictpropertiesforfutureprojects • Theymaybeveryexpensivetobuild • Findingsindefeatpredictionareashowpromisingresult • Earlierfindingshowthattransferringdatacomewiththecostofreducedperformance • Filteringthetransferreddatamayaddresstheproblem
Introduction • Background(cont.) • Previousresultshowtransferringeffortestimationresultisachallengingtask • Kitchenham et al.reviewed7publishedtransferstudies,inmostcases, transferreddataisworse • Ye et al.reportedthatCOCOMOmodelhavechangedradicallyfornewdatacollectedin2000-2009
Introduction • Research Question • Istransferlearningeffectiveforeffortestimation? • Howusefularemanualdivisionofthedata? • Doestransferlearningforeffortestimationworkacrosstimeaswellasspace? • ThispaperusesTEAKaslaboratoryforstudyingtransferlearningineffortestimation
Introduction • RelatedWork • Methodology • Results • Conclusions
Related Work • Transferlearning(TL) • SourcedomainDS,SourceTaskTS,TargetdomainDT,TargetTaskTT • TLtriestoimproveanestimationmethodinDTusingtheknowledgeofDSandTS • DS≠DT,TS≠TT
Related Work • TransferlearningandSE • ThepriorresultperformanceofTLareunstable • 10studiesreviewedbyKitchehametal.,4studiesfavoredwithindata,another4studiesfoundthattransferringdataisnotstatisticallysignificantlyworsethanwithindata,2studieshadinconclusiveresults • Zimmermannetal.foundwithinperformedbetterin618cases(total622cases)indefeatprediction • Turhan et al.compareddefectpredictorslearnedfromtransferredorwithindata,foundtransferredpredictorshavepoorperformance.Butafterinstanceselection,theyarenearlythesame
Introduction • RelatedWork • Methodology • Results • Conclusions
Methodology • Dataset • Tukututudatabase • 195projectsfrom51companies • Eliminatedallthecompanieswithlessthan5projects,125projectsfrom8companies
Methodology • Dataset(cont.) • Cocomo81 • Coc-60-75 • Coc-76-rest • Nasa93 • Nasa-70-79 • Nasa-80-rest
Methodology • PerformanceMeasures • Meanabsoluteerror(MAE) • MeanMagnitudeofRelativeError(MMRE) • MMRE=mean(allMREi) • MedianMagnitudeofRelativeError(MdMRE) • MdMRE=median(allMREi) • Pred(25)
Methodology • PerformanceMeasures(cont.) • Mean MagnitudeofErrorRelative(MMER) • MeanBalancedRelativeError(MBRE) • MeanInvertedBalancedRelativeError(MIBRE) • StandardizedAccuracy(SA)
Methodology • InstanceSelectionandRetrieval • Analogy-basedestimation(ABE) • Inputadatabaseofpastprojects • Foreachtestinstance, retrieve k similar projects • For choosing k analogies use a similarity measure • Before calculating similarity, scale independent features to 0-1 interval so thathigher numbers do not dominate the similarity measure. • Use a feature weighting scheme to reduce the effect of less informativefeatures. • Adapt the effort values of the k nearest analogies to come up with the effort estimate.
Methodology • InstanceSelectionandRetrieval(cont.) • TEAK • TEAK is a variance-based instance selector that discards training data associated with regions of high dependent variable (effort) variance
Methodology • Experimentation • Goal • Getthepercentageofeverysubsetwouldberetrievedintokanalogiesusedforestimation • AnswerwhetherTLcanenablethe use of datafrom other organizations as well as from other time intervals
Introduction • RelatedWork • Methodology • Results • Conclusions
Result • TransferinSpace • Tuku1,4-7’stieare veryhigh • Tuku8,performance dependontheerror Measure
Result • TransferinTime
Result • Inspecting Selection Tendencies
Result • Inspecting Selection Tendencies • Finding1 • Onlyaverysmallportionofalltheavailabledataistransferredasusefulanalogies • Finding2 • Whenwecomparethediagonalandoff-diagonalpercentages,weseethatthevaluesareveryclose
Introduction • RelatedWork • Methodology • Results • Conclusions
Conclusion • Whenprojectslacksufficientlocaldatatomakepredictions,theycantrytotransferinformationfromotherprojects • Researchquestion • RQ1:Is transfer learning effective for effort estimation? • Inmajorityofthecases,transferredresultsaswellaswithinresults
Conclusion • Researchquestion(cont.) • RQ2:How useful are manual divisions of the data? • Testinstancesselectequalamountsoinstancesfrom within and transferred data sources • Teakfoundnoaddedvaluein restricting reasoning to just within a delphi localization • RQ3:Does transfer learning for effort estimation work across time as well as space? • Yes • Itmaybemisguidedtothink • Thedataofanotherorganizationcannotbeused • Olddataofanorganizationisirrelevant
Conclusion • Thoughts • Heterogeneousdatafortransferlearningineffortestimationischallengingandmeaningful • Sometimewecoulduseothersmethodtovalidateourthoughts