1 / 44

A Population Size Estimation Problem

A Population Size Estimation Problem. Eliezer Kantorowitz Software Engineering Department Ort Braude College of Engineering kantor@cs.technion.ac.il. Table of Contents. The problem Capture Recapture Estimators Estimating number of software defects Defect injection estimators Our experiments

kaye-wright
Download Presentation

A Population Size Estimation Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Population Size Estimation Problem Eliezer KantorowitzSoftware Engineering DepartmentOrt Braude College of Engineeringkantor@cs.technion.ac.il

  2. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  3. Estimating Population Size • Two steps • Make an observation • Employ an estimator on the number of observed items • Example: Industrial quality assurance • Count the number of defects in a sample • Estimate the defect population size from the defects counted in the sample @2006 Eliezer Kantorowitz

  4. Partial Observation Methods • A Partial Observation Methods is an observation method that do not produce a count all the relevant items • Example: Due to poor lighting, some of the defect items in the sample cannot be seen @2006 Eliezer Kantorowitz

  5. This talk is about estimators applicable when using partial observation methods @2006 Eliezer Kantorowitz

  6. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  7. Counting Wild AnimalsCapture Recapture Estimators • Example: Counting the gazelle population in upper Galilee • Problem: We can only observe a part of the n members of the gazelle population • Solution: We capture ntag gazelles in a trap. The gazelles are tagged and freed • We assume that the freed gazelles are evenly mixed with the remaining n-ntag gazelles @2006 Eliezer Kantorowitz

  8. Capture Recapture (CR) - 2 • We put a new trap and capture m gazelles of which mtag are recaptured gazelles • The gazelle population size n may be estimated as assuming @2006 Eliezer Kantorowitz

  9. Capture Recapture (CR) -3 • A number of different CR estimators corresponding to different sets of assumptions have been developed • The essence of CR is that we enter (inject) a KNOWN number of tagged animals into the unknown number of animals. This known number can be employed in later statistical analysis @2006 Eliezer Kantorowitz

  10. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  11. Problem discussed in the following : Estimating the Number of Defects in Software Users Requirements Document (URD)

  12. Users Requirements Document (URD) • Prepared by software analysts and users • Part of software ordering contract • In one case 55% of all defects (“bugs”) were URD defects • URD validation usually done by inspection @2006 Eliezer Kantorowitz

  13. Example: a URD used in ourexperiments PURPOSE Manage a costume shop, which rents and sells costumes. Control the inventory and customer databases. Manage orders and invoices.CUSTOMER DATABASE - SYSTEM ACTIVITIES Enter new customers. Automatic updates of the customer’s database. List of customers active over the last three years. List of customers ordered by the age of the children. List of customers ordered by their purchase and rental transactions. @2006 Eliezer Kantorowitz

  14. URD validation • Usually done by inspection @2006 Eliezer Kantorowitz

  15. Inspection Method (Fagan 1986) • The inspected document is presented by its originator to a team of human inspectors • Each inspector inspects the entire document and records the found defects • Meeting of all inspectors, where defects found by different inspectors are checked and combined into one list @2006 Eliezer Kantorowitz

  16. Inspection Problem • Usually an inspector sees only a part of all defects • Different inspectors usually see different sets of defects • A team of j+1 inspectors usually detects more defects than a team of j inspectors • Inspection costs proportional to j @2006 Eliezer Kantorowitz

  17. Fault Detection Ratio (FDR) as Function of Inspector Team Size FDR=(number detected faults)/(total number of faults) @2006 Eliezer Kantorowitz

  18. CR was adapted to the inspection problem Defects detected by more than one inspector play a similar role to that of recaptured gazelles Extensive experiments suggest that CR is not providing sufficient accurate estimates Using Capture Recapture (CR) @2006 Eliezer Kantorowitz

  19. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  20. Defect Injection • In CR methods we freed a KNOWN number of tagged animals • In defect injection methods we enter (inject) a KNOWN number of defects into the document @2006 Eliezer Kantorowitz

  21. Defect Injection Method • ninjected – number of injected defects • ninjected-detected – number of detected injected defects • nreal - number of real defects (the unknown) • ndetected-real – number of detected real defects • Estimated number or real defects:nreal = ndetected-real(ninjected/ ninjected-detected) @2006 Eliezer Kantorowitz

  22. Problems of Defect Injection • The injected defects must “represent” the real defects “correctly” @2006 Eliezer Kantorowitz

  23. Defect Types Distribution @2006 Eliezer Kantorowitz

  24. Examples of Injected Defects Inconsistent Information Lists of customers entered by different techniques that contradict each other (lines 3 and 26). Cancellation of an order that was reserved is illegal (lines 28 and 32). The systems do not keep customer data for more than three years (lines 5 and 10). There is not enough information about the customers in the system (lines 6 and 10). An article that was reserved cannot be sold. (lines 27 and 33). Missing functionality: … Missing information: … @2006 Eliezer Kantorowitz

  25. Defect Injection Summary • Common method for software documents • Sufficient accurate estimates • Difficult to produce “representative” defects • Laborious @2006 Eliezer Kantorowitz

  26. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  27. The Experimentators • Eliezer Kantorowitz • Arie Guttman • Lior Arzi • Assaf Harel @2006 Eliezer Kantorowitz

  28. Experiments - 1 • Computer Science students at Technion • 250 freshmen • 69 senior • Industry engineers • 25 engineers • Two experiments from literature • 57 senior Computer Science students • All together 401 persons involved @2006 Eliezer Kantorowitz

  29. Experiments - 2 • Employed requirements documents • Costume shop information system • Missile launcher • Railroad system (in experiments from litterature) • Data of good quality • 401 persons • Careful preparation @2006 Eliezer Kantorowitz

  30. Typical Results Y axis is the number of inspectors that detected the different defects. The two “easiest to detect” defects were detected by 6 inspectors each @2006 Eliezer Kantorowitz

  31. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  32. Pi,1 (probability that fault i is detected by 1 inspector) 1 P0,1 i (fault number) 0 nmax n-1 The Model - 1 The linearity assumption @2006 Eliezer Kantorowitz

  33. The Model - 2 The linearity assumption Pi,1 - probability that one inspector detects defect i. nmax– defects 0 ≤ i <nmax can be detected @2006 Eliezer Kantorowitz

  34. The Model - 3 • P0,1 - The probability that one inspector detects the “easiest to detect” defect • P0,1 ε[0,1] - A measure of the ease of detection • FDRmax – The inspectors are able to detect the proportion FDRmax of the n defects, i.e. FDRmaxn defects • FDRmax ε[0,1] – a measure of the domain knowledge of the inspectors @2006 Eliezer Kantorowitz

  35. The Model – 4 The probability that j inspectors will detect defect i may be estimated: j inspectors are expected to detect FDR(j)n defects: For n →∞ @2006 Eliezer Kantorowitz

  36. Kantorowitz Estimator Example of application: A quality assurance manager can employ this estimator to estimate the number of inspectors j required to detect the proportion FDR(j) of all faults. The coefficients FDRmax and P0,1 must somehow be estimated This estimator is implicitly the cost function required in a Total Quality Management (TQM). The number of inspectors j represent the costs, while FDR(j) represents the quality @2006 Eliezer Kantorowitz

  37. Application example: What is the optimal inspector team size? Teams of 2 detects the largest number of defects per inspector @2006 Eliezer Kantorowitz

  38. FDR max Application example: Comparing Engineers with Students Experiment with Missile launcher user requirements document Example: 4 student teams achieve FDR=053 while only two engineer teams do FDR=0.54, i.e. an engineer detected about twice as many defects as a student @2006 Eliezer Kantorowitz

  39. Summary of my estimator • Based on a property of the data observed in a large number of experiments • The estimator was derived by modeling the observed property of the data • Sufficient accurate • Measuring the two coefficients of the model P0,1and FDRmaxis laborious, however, their numerical values may be estimated from similar cases @2006 Eliezer Kantorowitz

  40. Table of Contents • The problem • Capture Recapture Estimators • Estimating number of software defects • Defect injection estimators • Our experiments • Our estimator • Conclusions and Research plans @2006 Eliezer Kantorowitz

  41. Surveyed Estimators for the incomplete Counting Problem • Capture Recapture • Defect Injection • My estimator @2006 Eliezer Kantorowitz

  42. My Estimator vs. Capture Recapture Estimators • Our estimator was sufficient accurate for estimating the number of defects in use requirements documents, while the CR estimators were not sufficient accurate • Were the data in the extensive CR experiments of sufficient good quality? @2006 Eliezer Kantorowitz

  43. Why did My Estimator Work • My estimator exploited a property of the data (the linearity assumption) • The property was detected through careful extensive experimentation @2006 Eliezer Kantorowitz

  44. Looking for Similar Applications • Can the approach of this research be useful in other areas where the employed observation method only count part of the relevant items? @2006 Eliezer Kantorowitz

More Related