320 likes | 411 Views
Datarec status Reprocessing plans MC status MC development plans Linux Operational issues Priorities AFS/disk space. Offline Discussion. M. Moulson 22 October 2004. Datarec DBV-20. Run > 31690. DC geometry updated Global shift: D y = - 550 μ m, D z = - 1080 μ m
E N D
Datarec status • Reprocessing plans • MC status • MC development plans • Linux • Operational issues • Priorities • AFS/disk space Offline Discussion M. Moulson 22 October 2004
Datarec DBV-20 Run > 31690 • DC geometry updated • Global shift: Dy = -550 μm, Dz = -1080 μm • Implemented in datarec for Run > 28000 • Thickness of DC wall not changed (-75 μm) • Modifications to DC timing calibrations • Independence from EmC timing calibrations • Modifications to event classification (EvCl) • New KSTAG algorithm (KS tagged by vertex in DC) • Bunch spacing by run number in T0_FIND step 1 for ksl • 2.715 ns for 2004 data (also for MC, some 2000 runs) • Boost values • Runs not reconstructed without BMOM v.3 in HepDB • px values from BMOM(3) now used in all EvCl routines
Datarec operations • Runs 28479 (29 Apr) to 32380 (21 Oct, 00:00) • 413 pb-1 to disk with tag OK • 394 pb-1 with tag = 100 (no problems) • 388 pb-1 with full calibrations • 371 pb-1 reconstructed (96%) • 247 pb-1 DSTs (except K+K-) • fsun03-fsun10 decommissioned 11 Oct • Necessary for installation of new tape library • datarec submission moved from fsun03 to fibm35 • DST submission moved from fsun04 to fibm36 • 150 keV offset in s discovered!
150 keV offset in s • Discovered while investigating ~100 keV discrepancies between physmon and datarec • +150 keV adjustment to fit value of snot implemented • in physmon • in datarec • when final BVLAB s values written to HepDB • Plan of action: • New Bhabha histogram for physmon fit, taken from data • Sync datarec fit with physmon • Fix BVLAB fit before final 2004 values computed • Update 2001-2002 values in DB records • histogram_history and HepDB BMOM 2001-2002 currently from BVLAB scan, need to add 150 KeV • Update of HepDB technically difficult, need a solution
Reprocessing plans • Issues of compatibility with MC • DC geometry, T0_FIND modifications by run number • DC timing modifications do not impact MC chain • Additions to event classification would require new MCDSTs only • In principle possible to use run number range to fix px values for backwards compatibility • Use batch queues? • Main advantage: Increased stability
Further datarec modifications • Modification of inner DC wall thickness (-75 μm) • Implement by run number • Cut DC hits with drift times > 2.5 μs • Suggested by P. de Simone in May to reduce fraction of split tracks • Others?
Generation of rare KSKL events KS pen pmn gg p+p-p0 3p0 KL p+p- p0p0 gg p+p-g (DE) p0pn • Peak cross section: 7.5 nb • Approx 2x sum of BRs for rare KL channels • In each event, either KS or KL decays to rare mode • Random selection • Scale factor of 20 applies to KL • For KS, scale factor is ~100
MC development plans • Beam pipe geometry for 2004 data (Bloise) • LSB insertion code (Moulson) • Fix rp generator (Nguyen, Bini) • Improve MC-data consistency on tracking resolution (Spadaro, others) • MC has better core resolution and smaller tails than data in Emiss - pmiss distribution in pp background for KS pen analysis • Improving agreement would greatly help for precision studies involving signal fits, spectra, etc. • Need to systematically look at other topologies/ variables • Need more people involved
Linux software for KLOE analysis • P. Valente had completed an earlier port based on free software • VAST F90-to-C preprocessor • Clunky to build and maintain • M. Matsyuk has completed a KLOE port based on the Intel Fortran compiler for Linux • Individual, non-commercial license is free • libkcp code compiles with zero difficulty • Reconsider issues related to maintenance of KLOE software for Linux
Linux usage in KLOE analysis • Most users currently processing YBOS DSTs into Ntuples on farm machines and transferring Ntuples to PCs • AFS does not handle random-access data well • i.e., writing CWNs as analysis output • Multiple jobs on a single farm node stress AFS cache • Farm CPU (somewhat) limited • AFS disk space perennially at a premium • KLOE software needs minimal for most analysis jobs • YBOS to Ntuple: No DC reconstruction, etc. • Analysis jobs on user PCs accessing DSTs via KID and writing Ntuples locally should be quite fast • Continuing interest on part of remote users
KLOE software on Linux: Issues • Linux machines at LNF for hosting/compilation • 3 of 4 Linux machines in Computer Center are down, including klinux (mounts /kloe/soft, used by P. Valente for VAST build) • KLOE code distribution • User PCs do not mount /kloe/soft • Move /kloe/soft to network-accessible storage? • Use CVS for distribution? • Elegant solution but user must periodically update… • 3. Individual users must install Intel compiler • 4. KID • Has been built for Linux in the past • 5. Priority/manpower
Operational issues • Offline expert training • 1-2 day training course for all experts • General update • PC backup system • Commercial tape backup system available to users to backup individual PCs
Priorities and deadlines • In order of priority, for discussion: • Complete MC production: KSKL rare • Reprocessing • MC diagnostic work • Other MC development work for 2004 • Linux • Deadlines?
Disk resources 2001 – 2002 Total DSTs 7.4 TB Total MCDSTs 7.0 TB 2004 DST volume scales with L • 3.2 TB added to AFS cell • Not yet assigned to analysis groups • 2.0 TB available but not yet installed • Reserved for testing new network-accessible storage solutions
Limitations of AFS • Initial problems with random-access files blocking AFS on farm machines resolved • Nevertheless, AFS has some intrinsic limitations: • Volume sizes at most 100 GB • Already pushed to the limit – max spec is 8 GB! • Cache must be much larger than AFS-directed data volume for all jobs on farm machine • Problem characteristic of random-access files (CWNs) • Current cache sizes 3.5 GB on each farm machine • More than sufficient for a single job • Possible problems with 4 big jobs/machine • Enlarging cache sizes requires purchase of more local disk for farm machines
Network storage: Future solutions • Possible alternatives to AFS • NFS v. 4 • kerberos authentication – use klog as with AFS • Size of data transfers smaller, expect fewer problems with random-access files • Storage Area Network (SAN) filesystem • Currently under consideration as a Grid solution • Works only with Fibre Channel (FC) interfaces • FC – SCSI/IP interface implemented in hardware/software • Availability expected in 2005 • Migration away from AFS probable within ~6 months • 2 TB allocated to tests of new network storage solutions • Current AFS system will remain interim solution
Current AFS allocations 365 200 400
A fair proposal? • Each of the 3 physics WGs gets 1400 GB total • Total disk space (incl. already installed) divided equally • Physics WGs similar in size and diversity of analyses • WGs can make intelligent use of space • e.g.: Some degree of Ntuple sharing already present • Substantial increases for everyone anyway
Offline CPU/disk resources for 2003 • Available hardware: • 23 IBM B80 servers: 92 CPU’s • 10 Sun E450 servers: 18 B80 CPU-equivalents • 6.5 TB NFS-mounted recall disk cache • Easy to reallocate between production and analysis • Allocation of resources in 2003: • 64 to 76 CPU’s on IBM B80 servers for production • 800 GB of disk cache for I/O staging • Remainder of resources open to users for analysis
Analysis environment for 2003 • Production of histograms/Ntuples on analysis farm: • 4 to 7 IBM B80 servers + 2 Sun E450 servers • DST’s latent on 5.7 TB recall disk cache • Output to 2.3 TB AFS cell accessed by user PC’s • Analysis example: • 440M KSKL events, 1.4 TB DST’s • 6 days elapsed for 6 simultaneous batch processes • Output on order of 10-100 GB • Final-stage analysis on user PC/Linux systems
CPU power requirements for 2004 Input rate (KHz) B80 CPU’s needed to follow acquisition 76 CPU offline farm Avg L (1030 cm-2s-1)
CPU/disk upgrades for 2004 • Additional servers for offline farm: • 10 IBM p630 servers: 10×4 POWER4+ 1.45 GHz • Adds more than 80 B80 CPU equivalents to offline farm • Additional 20 TB disk space • To be added to DST cache and AFS cell • More resources already allocated to users • 8 IBM B80 servers now available for analysis • Can maintain this allocation during 2004 data taking Ordered, expected to be on-line by January
Installed tape storage capacity • IBM 3494 tape library: • 12 Magstar 3590 drives, 14 MB/s read/write • 60 GB/cartridge (upgraded from 40 GB this year) • 5200 cartridges (5400 slots) • Dual active accessors • Managed by Tivoli Storage Manager • Maximum capacity: 312 TB (5200 cartridges) • Currently in use: 185 TB
MC DST recon raw Tape storage requirements for 2004 Stored vol. by type (GB/pb-1) Tape library usage (TB) 118 2002 98 43 16 free 2004 est. Incl. streaming mods 57 49 43 16 Today +780 pb-1 +1210 pb-1 +2000 pb-1
Tape storage for 2004 • Additional IBM 3494 tape library • 6 Magstar 3592 drives: 300 GB/cartridge, 40 MB/s • Initially 1000 cartridges (300 TB) • Slots for 3600 cartridges (1080 TB) • Remotely accessed via FC/SAN interface • Definitive solution for KLOE storage needs Bando di gara submitted to Gazzetta Ufficiale Reasonably expect 6 months to delivery Current space sufficient for a few months of new data
Machine background filter for 2004 • Background filter (FILFO) last tuned on 1999-2000 data • 5% inefficiency for ppg events, varies with background level • Mainly traceable to cut to eliminate degraded Bhabhas • Removal of this cut: Reduces inefficiency to 1% • Increases stream volume 5-10% • Increases CPU time 10-15% • New downscale policy for bias-study sample: • Fraction of events not subject to veto, written to streams • Need to produce bias-study sample for 2001-2002 data • To be implemented as reprocessing of a data subset with new downscale policy • Will allow additional studies on FILFO efficiency and cuts
Other offline modifications for 2004 • Modifications to physics streaming: • Bhabha stream: keep only subset of radiative events • Reduces Bhabha stream volume by 4 • Reduces overall stream volume by >40% • KSKL stream: clean up choice of tags to retain • Reduces KSKL stream volume by 35% • K+K- stream: new tag using dE/dx • Fully incorporate dE/dx code into reconstruction • Eliminate older tags, will reduce stream volume • Random trigger as source of MC background for 2004 • 20 Hz of random triggers synched with beam crossing allows background simulation for L up to 21032 cm-2s-1
KLOE computing resources DB2 server IBM F50 4×PPC604e 166 • online farm • 7 IBM H50 4×PPC604e 332 • 1.4 TB SSA disk AFS cell 2 IBM H70 4×RS64-III 340 1.7 TB SSA + 0.5 TB FC disk 100 Mbps 1 Gbps CISCO Catalyst 6000 nfs afs offline farm 19 IBM B80 4×POWER3 375 8 Sun E450 4×UltraSPARC-II 400 analysis farm 4 IBM B80 4×POWER3 375 2 Sun E450 4×UltraSPARC-II 400 file servers 2 IBM H80 6×RS64-III 500 nfs nfs managed disk space0.8 TB SSA: offline staging 6.5 TB 2.2 TB SSA + 3.5 TB FC: latent disk cache tape library IBM 3494, 5400 60GB slots, 2 robots, TSM 324 TB 12 Magstar E1A drives, 14 MB/sec each
2004 CPU estimate: details Extrapolated from 2002 data with some MC input • 2002 • L = 36 mb-1/s • T3 = 1560 Hz • 345 Hz f + Bhabha • 680 Hz unvetoed CR • 535 Hz bkg • 2004 • L = 100 mb-1/s (assumed) • T3 = 2175 Hz • 960 Hz f + Bhabha • 680 Hz unvetoed CR • 535 Hz bkg (assumed constant) • From MC: • sf = 3.1 mb (assumed) • f + Bhabha trigger: s = 9.6 mb • f + Bhabha FILFO: s = 8.9 mb • CPU(f + Bhabha) = 61 ms avg. • CPU time calculation: • 4.25 ms to process any event • + 13.6 ms for 60% of bkg evts • + 61 ms for 93% of f + Bha evts • 2002: 19.6 ms/evt overall – OK • 2004: 31.3 ms/evt overall (10%)
2004 tape space estimate: details • 2001: 274 GB/pb-1 • 2002: 118 GB/pb-1 • Highly dependent on luminosity • 2004: Estimate a priori • Assume: 2175 Hz @ 2.6 KB/evt • Raw event size assumed same for all events (has varied very little with background over KLOE history) • Assume: L = 100 mb-1/s • 1 pb-1 = 104 s: • 25.0 GB for 9.6M physics evts • 31.7 GB for 12.2M bkg evts • (1215 Hz of bkg for 104 s) • 56.7 GB/pb-1 total Include effects of streaming changes: raw recon Assumes 1.7M evt/pb-1 produced f all (1:5) and fKSKL (1:1) MC