1 / 2

StatHEP application

StatHEP application. Statistical Data Analysis for High Energy Physics.

wilma-allen
Download Presentation

StatHEP application

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StatHEP application Statistical Data Analysis for High Energy Physics The basic purpose of the application is to provide a framework for solving HEP problems using Toy Monte Carlo methods which are CPU intensive. Although dedicated to HEP, this is a general use framework which can be exploited in other scientific fields covered by the Applications Identification and Support activity of Baltic Grid. The application was built within the ROOT package (http://root.cern.ch). It is a multipurpose environment which provides many mathematical and graphical tools employed in data analyses. User script can be executed in two modes:  interactive mode (C++ interpreter) during the developement phase  batch mode for the production employing compiled version of user code (faster execution than for interpreted code) The structure of application consists of three main parts : Framework part - based on ROOT. It is responsible for providing ROOT environment with a given site. If no installation is found, the ROOT sources are imported and compiled. This is done only once by the first job arriving to a given cluster. This way the application can be executed on many flavors of Linux and some Unix platforms. User part - in the form of the ROOT script (C++). Although originally devoted to HEP topics, theuser script can be replaced to solve other problems of BalticGrid users. GRID part – responsible for submitting jobs and retrieval of results. Currently implemented in the form of a set of shell scripts to prepare and submit a series of jobs followed by collecting the outputsand final analysis. It is foreseen to integrate the application into Migrating Desktop. In the full production mode more than 10 000 jobs is expected to be executed in a single application. Depending on the complexity of the problem a single job may need a few hours of time on average CPU. Since the user script usually evolves, whole production procedure may require several repetitions. The reasonable response period (a few days) for such intensive calculations of the order of hundred thousands of CPU hours can be fulfilled only by GRID. The good example is the application developed to study the CP violation phenomena for the LHCb experiment performed at LHC (Large Hadron Collider) at CERN near Geneva. The CP, which transforms matter into anti-matter particle, is the combination of C – charge conjugation and P – parity.The violation of CP symmetry implies that the behaviour of matter and anti-matter is different. It is one of the three conditions necessary to explain why the visible Universe is overwhelmingly made of matter. Fig. 1. Construction of LHCb experiment. The GRID was used in the design phase and to study physics potential.

  2. The CP measurements are a result of a complicated procedure. B mesons are produced in hadronic environment of proton-proton collisions. The B meason production is over 100 times smaller than the normal one. Moreover, B meason decays that are interesting for CP measurements, are relatively rare ranging from 10−4 down to 10−9 of all B decays.Extraction of tiny signal out of the huge background requires sophisticated algorithms already at the level of on-line data taking (reduction from 40 million down to 200 events per second). The data are then reconstructed off-line and CP-violation phenomena are studied for more then 50 different B meason decay modes. In the final step, essential physics parameters are determined by applying a fit to the data. Taking into account the above-mentioned data reduction, it is practically impossible to produce sufficiently high statistics samples to estimate an uncertainty of the measurement.Toy Monte Carlo technique is commonly used in such cases. The main idea is to prepare simplified model of the measurement procedure and to submit thousands of jobs corresponding to thousands of experiments with slightly different conditions.Although the data are generated for the same value of a given physics quantity, each measurement yields a little bit different outcome. Typical distribution of the output values for one of CP violation parameters - angle  is shown in Fig.2. Fig. 2. The typical distribution of results from a bunch of 1000 jobs. The width ofthe distribution is an estimation of the measurementuncertainty. Standard deviation of this distribution is an estimation of the measurement uncertainty. Apart from that any statistically significant deviation of the mean value with respect to generated one is a manifestation of possible bias.The huge data reduction factor implies CPU intensive calculations, with typical execution time for single job at the level of few hours. The example of a procedure described above is an analysis of the BD*a1 decay. The result for the study of the uncertainty of the angle measurement which was performed using BalticGrid infrastructure is presented in Fig.3. Fig. 3. The estimated uncertainty of the one of CP violation parameter, the gamma angle of the unitarity triangle. Each point corresponds to 1000 jobs executed on GRID. Mariusz WitekMariusz.Witek@ifj.edu.pl Michał Krasowski Michal.Krasowski@ifj.edu.pl

More Related