240 likes | 377 Views
Problemi e strategie relativi al calcolo off-line di VIRGO. Laura Brocco Universita’ di Roma “La Sapienza” & INFN Roma1 for the VIRGO collaboration. Outlines. Part I Data Production Data Transfer and Storage Part II Search for gravitational wave pulses and quasi-periodic signals
E N D
Problemi e strategie relativi al calcolo off-line di VIRGO Laura Brocco Universita’ di Roma “La Sapienza” & INFN Roma1 for the VIRGO collaboration
Outlines Part I • Data Production • Data Transfer and Storage Part II • Search for gravitational wave pulses and quasi-periodic signals • Search for periodic signals • Conclusions
Status of Virgo • CITF commissioning ended on September 2002 5 Engineering Runs (three-days long) done • ITF commissioning started in September 2003 (ends in September 2004) 4 Engineering Runs done until now • Full Virgo locked before the end of 2004
Virgo Data Production 5 different data streams produced: • Raw data Time series containing information from the different sub-systems, recorded in 1 sec long frames. Each file is made of 300 frames ( 1.8 GByte size). The data flow is 6 MByte/sec. • Processed data h-recon, quality channels. Stored in frames 1 sec long. Expected data flow 0.6 MByte/sec. • Trend data Slowly acquired information, global information, fast quantities. These information are stored in frames 1 hour long. The expected data flow is about 10 kByte/sec. • 50 Hz data Fast channelsdown-sampled @ 50 Hz for long term studies Data flow 140 kByte/sec. • Network analysis data Data made available to external collaborations (i.e. LIGO). These data contains environmental data, h-recon, etc. Expected data flow ~ 1 MByte/sec (depending on the agreement on data exchange among the different experiments).
Data Transfer & StorageI – Present Situation Cascina CNAF VIRGO bbftp CASCINA: 70 TByte storage (as data buffer for daily activities) + LTO Tapes CNAF: nas1, nas2 & nas3 9.96 TByte full with ER data (from E0 to C3) Asked up to 20 TByte for 2004 Transfer performed by virgo-gateway machine (Dell bi-processor @ 1 GHz) Data flow 3 MByte/sec LYON: Data stored with HPSS (from E0 to C3) Data flow @ 6.4 MByte/sec bbftp LYON
Data Transfer & StorageII – Futures Plans Cascina Bologna-CNAF bbftp to Lyon Cascina Storage SRM bbftp Server SRM MySQL archive SRM Client C2 MySQL archive On-Line SRM Client C3 To BKDB @ Lyon Bologna Storage bbftp bbftp Server SRM Client C2 Temp. Buffer SRM Client C3 SRM Client C1 To BKDB @ Lyon
Book-Keeping Data-Base Oracle Data-Base. Generated by SRM Client C3 in Cascina, and hosted in Lyon. Replicated both in Bologna and Cascina CascinaBolognaLyonFile Info.Info.Info. Information Directory 1 yes 0 no/deleted 2 in transfer Directory 1 yes 0 no/deleted 2 in transfer Directory 1 yes 0 no/deleted 2 in transfer Name, Size GPS time DAQ information Event information
Data analysis RequirementsI - Search for bursts & coalescing binary gravitational signals Bursts: Short signals (4100 ms) of unknown shape, frequencies between 50 Hz and 6000 Hz, and amplitude 10-25≤ h ≤ 10-20. Specific Burst oriented software developed: • Burst Library (BuL): C++ library containing several packages dedicated to the search for burst gravitational waves. BuL is developed on DEC/OSF1 V5.2, Linux/RH 6.1 and Linux/RH 7.2, and all the packages are managed and built using CMT. • SNAG (Signal and Noise for Gravitational Antennas): MatLab toolbox containing filters to perform burst searches both in frequency and time domain. SNAG is developed on Windows & Linux (to be completed)
Data analysis RequirementsI - Search for bursts & coalescing binary gravitational signals Preprocessing for Bursts analysis • Whitening: Library dedicated to perform data whitening. There exists a C version (LIB_Whitening original) and a C++ version (Whitening, interfaced with BuL) • Ana Batch: C++ framework which provides some facilities to extract data from Virgo data files (in Frame format). • NAP (Noise Analysis Package) C & C++ library containing all the packages dedicated to noise studies and simulations (in development). Typical duration of jobs: 1 hour CPU-time for 1 hour of data samples (on a Xeon bi-processor @ 1.7 GHz with 1.5 GByte RAM) From 1/2 to 1 hour CPU-time for 1/2 hour of data samples on MatLab (Windows), depending on the number of templates and of threshold values Some algorithms need machine cluster (matched filtering with 1000 templates)
chirp Data analysis RequirementsI - Search for bursts & coalescing binary gravitational signals Coalescing Binary Systems: • Compact stars (NS/NS, NS/BH, BH/BH) • The exact shape of the signal is accurately predictable, but depends on the two masses of the stars, on their spin rates + several relativistic effects
Data analysis RequirementsI - Search for bursts & coalescing binary gravitational signals Coalescing Binary Systems: • Matched filtering techniques have been developed, with thousands of banks of filters (Templates average size 4 MByte) • Single frequency band analysis (Flat Search), running with Merlino framework (written on Ansi C, communication based on MPI on a beowulf cluster) • Two frequency band analysis (Multi-Band Template Analysis), with same templates grids for all frequency bands • Dynamic Matched Filter Techniques (Price Algorithm) • Hierarchical strategies using ALE (Adaptive Line Enhanced filters) • Needed high computing power(~ 300 Gflops for in-time analysis, 3 times more for off-line analysis) and needed distribute framework to parallel computation
Data analysis RequirementsI - Search for bursts & coalescing binary gravitational signals Scheme for bursts and coalescing binary detection To be implemented @ Bologna Data Storage • h reconstruction • 2 signals @ 20kHz Lines removal Raw data Whitening Storage Decimation/ Re-sampling Storage Ev. selected Ev. selected Bursts Filters C.B. Filters
Data analysis RequirementsII - Search for periodic gravitational signals Periodic gravitational signals are emitted, e.g., by asymmetric rotating neutron stars. Amplitude of the signals very low long integration times (~ months) are needed. Hierarchical strategy has been developed based on the alternation of “coherent” and “incoherent” steps. Two main computing centers, Bologna and Lyon, plus Napoli and Roma Large computing resources needed for the analysis: Tflops range However, the larger is the CP we can access and the wider is the portion of source parameter space we can explore. Low granularity: the analysis method is well suited to a distributed computing environment.
C.C. Storage preliminary analisys Performed locally (coherent steps). Typical dimensions ~1.2 MB for 6 months ofdata. Replicated among SEs. input files ~ 105 jobs sent in3 months (incoherent steps) Typical job duration ~5-10 hours on a 2.4 GHZ Xeon proc, depending on the source frequency. GRID Candidates copied back to a local machine for further steps of the analysis. Typical output files dimensions ~200kB, ~2∙104 candidates. candidates
We are carrying on test activities on the data analysis software in two computing environments: local batch systems (PBS) and grid (INFN-Grid). • Main activities so far: • Adaptation of the data analysis procedures to work in a distributed environment; • Tests of the “incoherent” part of the analysis pipeline (several software versions) using simulated data (thousands of jobs submitted). Used machines: • Roma, Bologna, Napoli (about 30 machines) whithin INFN-Grid • Lyon (25 processors) as a classic batch system • Full-scale test of the “coherent” part of the analysis (28 processors for ~3 months, 24 hours/day; farms in Bologna and Roma). • Results: • very good scaling of performances with the number of nodes involved (but only small scale tests done up to now); • grid software more and more stable and reliable;
Conclusions The Virgo experiment will complete the commissioning in 2004. Data Production: 5 kinds of data will be produced, with data flow from 10 kByte/sec (Trend Data) up to 6 MByte/sec (raw data) Typical raw-data file size 1.8 GByte Storage: 2 permanent storage, Bologna-CNAF and Lyon, + Cascina Automatic processes to transfer data from Cascina to Bologna and from Bologna to Lyon are in development Data Analysis: Several filters have been developed to search for gravitational waves, all the filtering techniques need for high computing power and parallel computations. 4 M.D.C. (productions) performed until now, next foreseen in June. GRID tests have been performed using Roma, Bologna and Napoli farms. Larger scale tests will be performed in next months. The analysis of scientific data will start in 2005.
Merlino Framework By Leone B.Bosi • Distributed framework for data a parallel data analysis • Is composed of 4 main processes • Written in ANSI C code, communication based on MPI and running on a Beowulf cluster • “plug-ins” functions customization (dynamic library) • Data flow customization • Plug-in actually used, tested of under develop: • Matched Filter • Inspiral generator • Mean Filter • PC • Dumped SineFilter
Next steps (in 2004) • integration and validation of the whole analysis software; • larger scale grid tests (up to ~100 processors and more involved);
Scenario 1 RLS @ Cnaf RB BDII Virgo RB BDII Virgo MDS Virgo-I MDS Virgo-F Virgo-F …. - CE SE GIIS GIIS GIIS GRIS GRIS Virgo-I Cnaf CE SE Virgo-F Lione - INFN-GRID Virgo CE SE GIIS GRIS GRIS GIIS GRIS GRIS Virgo-I Roma Virgo-I Napoli Lyon CE SE CE SE GIIS GRIS GIIS GRIS GRIS GRIS by Antonia Ghiselli