1 / 36

Statistical Testing Project

Statistical Testing Project. Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team. LCG-Application Meeting CERN, 27 November 2002. http://www.ge.infn.it/geant4/analysis/TandA. History and background. Interest in other areas, not only Geant4?. LCG?. What is?.

halden
Download Presentation

Statistical Testing Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team LCG-Application Meeting CERN, 27 November 2002 http://www.ge.infn.it/geant4/analysis/TandA

  2. History and background

  3. Interest in other areas, not only Geant4? LCG? What is? A project to develop a statistical analysis system, to be used in Geant4 testing physics validation regression testing system testing Main application areas in Geant4: Provide tools for the statistical comparison of distributions • equivalent reference distributions (for instance, regression testing) • experimental measurements • data from reference sources • functions deriving from theoretical calculations or from fits

  4. History • “Statistical testing” agreed in the Geant4 Collaboration as a major objective for 2002 • Initial ideas presented at Geant4 TSB meeting, November 2001 • Open brainstorming session at a Geant4-WG workshop, 31 May 2002 • Inception phase,summer 2002 • Informal discussions with STT, Geant4 collaborators and interested potential developers • Initial collection of user requirements in Geant4 • First version of software process deliverables: Vision, URD, Risk List • Presentation at Geant4 Workshop + parallel sessions, October 2002 • http://www.ge.infn.it/geant4/talks/G4workshop/CERN/pia/tanda-2002.ppt Launch of the project

  5. The team interested collaborators are welcome! Development team • Pablo Cirrone, INFN Southern National Lab • Stefania Donadio, Univ. and INFN Genova • Susanna Guatelli, CERN/IT/API Technical Student and INFN Genova • Alberto Lemut, Univ. and INFN Genova • Barbara Mascialino, Univ. and INFN Genova • Sandra Parlati, INFN Gran Sasso National Lab • Andreas Pfeiffer, CERN/IT/API • Maria Grazia Pia, INFN Genova Geant4 system integration team • Gabriele Cosmo, CERN/IT/API - Geant4 Release Manager • Sergei Sadilov, CERN/IT/API - Geant4 System Testing Coordinator Statistical consultancy • Paolo Viarengo, Univ. Genova, Statistician + requirements, suggestions, b-testing by many other Geant4 Collaborators (M. Maire, A. Ribon, L. Urban et al.)

  6. The vision

  7. Vision: the basics • Have a vision for the project • An internal tool for Geant4 physics & STT? • Also for Geant4 physics validation in the experiments? • Other parties than Geant4 interested? • Rigorous software process Clearly define scope, objectives Who are the stakeholders? Who are the users? Who are the developers? Clearly define roles Software quality Flexible, extensible, maintainable system Build on a solid architecture

  8. Scope of the project • The project will provide tools for statistical testing of Geant4 • physics comparisons and regression testing • multiple comparison algorithms • Generality(for application also in other areas) should be pursued • facilitated by a component-based architecture • The statistical tools should be used in Geant4 (and in other frameworks) • tool to be used in testing frameworks • not a testing framework itself • Re-use existing tools whenever possible • no attempt to re-invent the wheel • but critical, scientific evaluation of candidate tools

  9. Architectural guidelines • The project adopts a solid architectural approach • to offer the functionalityand the quality needed by the users • to be maintainable over a large time scale • to be extensible, to accommodate future evolutions of the requirements • Component-based approach • Geant4-specificcomponents + generalcomponents • to facilitate re-use and integration in diverse frameworks • AIDA • adopt a (HEP) standard • no dependence on any specific analysis tool • Python • The approach adopted is compatible with the recommendations of theLCG Architecture Blueprint RTAG

  10. The reason why we are here… • Core statistics comparison component + user layer can be generalised • to wider scope than Geant4 only • This is the reason why we present the project to LCG • to establish a scientific discussion on a topic of common interest • to see if there are any interested users • to see if there are any interested collaborators • We would all benefit of a collaborative approach to a common problem • share expertise, ideas, tools, resources…

  11. Software process guidelines • Significant experience in the team • in Geant4 and in other projects • Guidance from ISO 15504 • standard! • USDP, specifically tailored to the project • practical guidance and tools from the RUP • both rigorous and lightweight • mapping onto ISO 15504 • Open to use tools provided by the LCG Software Process Infrastructure project

  12. Name Description Responsibilities Geant4 STT Coordinator Coordinates system testing Ensure that the system meets the needs of Geant4 System Testing Geant4 physics coordinators Coordinate Geant4 std EM, lowE EM, hadronic WGs Ensure that the systemmeets the needs of Geant4 Physics Testing Geant4 TSB Is responsible for Geant4 technical matters Provide guidelines, monitors progress INFN Computing Committee National Committee whom part of the developers respond to; has appointed 4 referees Recommend funding; review the project, monitor progress Others? Who? LCG? Requirements? Expertise? Who are the stakeholders?

  13. Groups Responsibilities Geant4 physics Working Groups Provide and document requirements, provide feedback on prototypes, performb-testing on preliminary releases of the product, provide use cases for acceptance testing Geant4 STT Provide and document requirements, perform formal acceptance testing for adoption in system testing Who are the users? Other potential users: • users of the Geant4 Toolkit, wishing to compare the results of their applications to reference data or to their own experimental results • other projectswith requirements for statistical comparisons of distributions (e.g. the LHC Computing Grid project)

  14. Some use cases • Regression testing • Throughout the software life-cycle • Online DAQ • Monitoring detector behaviour w.r.t. a reference • Simulation validation • Comparison with experimental data • Reconstruction • Comparison of reconstructed vs. expected distributions • Physics analysis • Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) • Comparison with theoretical distributions (data vs. Standard Model)

  15. What do the users want? • User requirements from Geant4 (physics, system testing) elicited, analysed, specified and reviewed with the users • User Requirements Document • http://www.ge.infn.it/geant4/analysis/TandA/URD_TandA.html • Use case model in progress • Specific user requirements related to the core statisticalcomponent • Detail in progress (URD in preparation) • Input from LCG? • Requirement traceability • Analysis/design, implementation, test, documentation, results

  16. Are there any constraints? Geant4 constraint requirements • Based on AIDA • No concrete dependencies on specific AIDA implementations should appear in the code of the system tests • Available on Geant4 supported platforms • The system should not require additional licenses w.r.t. what required for Geant4 development • Other non-functional requirements?

  17. The core statistical component

  18. HBOOK, PAW & Co. HBOOK manual, 1994 Based on considerations such as those given above, as well as considerable computational experience, it is generally believed that tests like the Kolmogorov or Smirnov-Cramer-Von-Mises (which is similar but more complicated to calculate) are probably the most powerful for the kinds of phenomena generally of interest to high-energy physicists. […] The value of PROB returned by HDIFF is calculated such that it will be uniformly distributed between zero and one for compatible histograms, provided the data are not binned. […] The value of PROB should not be expected to have exactly thecorrectdistribution for binned data. but… CDF Collaboration, Inclusive jet cross section in p pbar collisions at sqrt(s) 1.8 TeV, Phys. Rev. Lett. 77 (1996) 438

  19. Goodness-of-fit tests • Pearson’s c2 test • Kolmogorov test • Kolmogorov – Smirnov test • Lilliefors test • Cramer-von Mises test • Anderson-Darling test • Kuiper test • … It is a difficult domain… Implementing algorithms is easy But comparing real-life distributions is not easy Incremental and iterative software process Collaboration with statistics experts Patience, humility, time… System open to extension and evolution Suggestions welcome!

  20. Pearson’s c2 • Applies to discrete distributions • It can be useful also in case of continuous distributions, but the data must be grouped into classes • Cannot be applied if the counting of the theoretical frequencies in each class is < 5 • When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached

  21. Kolmogorov test • The easiest among non-parametric tests • Verify the adaptation of a sample coming from a random continuous variable • Based on the computation of the maximum distance between an empirical repartition function and the theoretical repartition one • Test statistics: D = sup | FO(x) - FT(x)|

  22. Kolmogorov-Smirnov test • Problem of the two samples • mathematically similar to Kolmogorov’s • Instead of comparing an empirical distribution with a theoretical one, try to find the maximum difference between the distributions of the two samples Fn and Gm: Dmn= sup |Fn(x) - Gm(x)| • Can be applied only to continuous random variables • Conover (1971) and Gibbons and Chakraborti (1992) tried to extend it to cases of discrete random variables

  23. Lilliefors test • Similar to Kolmogorov test • Based on the null hypothesis that the random continuous variable is normally distributed N(m,s2), with m and s2 unknown • Performed comparing the empirical repartition function F(z1,z2,...,zn) with the one of the standardized normal distribution F(z): D* = sup | FO(z) - F(z)|

  24. Cramer-von Mises test • Based on the test statistics: w2 = integral (FO(x) - FT(x))2 dF(x) • Can be performed both on continuous and discrete variables • Satisfactory for symmetric and right-skewed distributions

  25. Anderson-Darling test • Performed on the test statistics: A2= integral { [FO(x) – FT(x)]2 / [FT(x) (1-FT(X))] } dFT(x) • Can be performed both on continuous and discrete variables • Seems to be suitable to any data-set (Aksenov and Savageau - 2002) with any skewness (symmetric distributions, left or right skewed) • Seems to be sensitive to fat tail of distributions

  26. Kuiper test • Based on a quantity that remains invariant for any shift or re-parameterization • Does not work well on tails D* = max (FO(x)-FT(x)) + max (FT(x)-FO(x))

  27. Work in progress

  28. OOAD • Preliminary design of the statistical component in progress • Core statistics comparison package • User layer • Policy-based class design • http://www.ge.infn.it/geant4/rose/statistics/ • Validation of the design through use cases • Some open issues identified, to be addressed in next design iteration

  29. + more algorithms work in progress

  30. work in progress

  31. Use case: compare two continuous distributions work in progress

  32. Work in progress • Implementation and test of preliminary design • What can be re-used? • Algorithms in GSL, NAG libraries (to be evaluated) • Studies in progress • Transformation between continuous-discrete distributions • Strategies to use Kolmogorov-Smirnov with discrete distributions (E. Dagum + original ideas) • How to deal with experimental errors (not only statistical!) • Multi-dimensional distributions • Bayesian approach • In the to-do list • Conversion from AIDA objects to distributions • “Pythonisation” • Revision of the initial documents (Vision, URD, Risks) • Based on the recent evolutions in the project • Input from today’s meeting?

  33. Work in progress: Geant4-specific • Development of general physics tests in the E.M. domain, for comparison of reference distributions • Compilation of existing tests • Evaluation, documentation of tests • Elicitation of requirements for tests among the Geant4 physics groups • Collection of reference data/distributions • Prototype for automated comparison w.r.t. reference databases • NIST, Sandia etc., directly downloaded from the web • Prototype as a risk mitigation strategy • Integration in the Geant4 system testing framework • Integration in Geant4 physics testing frameworks

  34. Where? • Geant4-specific stuff • In Geant4 • May be included in public distribution, if of interest to users • Core statistical component • Developed in an independent CVS repository • Code, documentation, software process deliverables • Web site • http://www.ge.infn.it/geant4/analysis/TandA/index.html • Contact persons • Andreas.Pfeiffer@cern.ch, Maria.Grazia.Pia@cern.ch

  35. Time scale • Aggressive time scale driven by Geant4 needs • incremental and iterative software process OOAD + implementation already started Prototype at CHEP Advanced functional system summer 2003 • Open to the needs/suggestions of LCG • compatible with the available resources and Geant4 needs

  36. Conclusions… • Geant4 requires a statistical testing system for physics validation and regression testing • to provide a high quality product to its user communities • Core statistical component (of potential general interest) • Geant4-specific components • Project compatible with LCG architecture blueprint • component-based approach, AIDA, Python… • Rigorous software process • to contribute to the quality of the product • Aggressive time scale dictated by Geant4 needs • Open to scientific collaboration Beginning…

More Related