BABAR: il nuovo modello di calcolo

BABAR: il nuovo modello di calcolo • Computer Model Working Group • Analysis Model • Event Store Technologies • Computing/Analysis Sites • Implementation/Migration J. Walsh INFN-Pisa Note: The new Babar Computing Model is still under development. Everything I present here today is still under discussion within Babar. J. Walsh, INFN-Pisa

Need for new computing model • Babar Computing Model established end of 2000 • changes in computing environment  require review • Babar Computing Review: April, 2002 • large luminosity of PEP-II puts big burden on Babar computing resources • Analysis Groups produce huge ntuples that essentially duplicate micro-dst information  not scalable with luminosity • two event store technologies, Root/IO (Kanga/Root) and Objectivity  burden on Computing Group • several other issues raised: role of remote sites, etc. • Computer Model Working Group 2 - update Babar Computing Model. So far work concentrated in two main areas • New proposal for analysis model • Event store technology- Root vs. Objectivity debate • Plus, will comment on Remote sites – Tier A, Tier C, etc. J. Walsh, INFN-Pisa

where we are now Luminosity Projection design luminosity (3x1033) exceeded in 2001 already expect 3 times current data sample in 2005 Previsioni '02 J. Walsh, INFN-Pisa

Computing Model Working Group 2: Members • Members: David Brown/LBL Claudio Campagnari Andreas Hoecker Hassan Jawahery (co-chair) Yury Kolomensky (co-chair) David Lange Mike Roney Aaron Roodman Anders Ryd Bernhard Spaan John Walsh Fergus Wilson • Technical Experts: • Jacek Becla • Fabrizio Bianchi • Nicole Chevalier • Nicolo De Groot • Gregory Dubois-Felsmann • Peter Elmer • Steffen Luitz • Mauro Morandin • Ex-Officio: • Stephen Gowdy CC • Rainer Bartoldus DepCC • Richard Mount SCS • Dominique Boutigny Tier-A Rep. • Livio Lanceri DepPAC J. Walsh, INFN-Pisa

Current Computing Model - Jargon • Analysis: • performed exclusively on micro format • Beta/Framework is analysis program, C++ based • Central Skims produce subsets of data pertinent to various analyses • Analysis Groups (AWG’s) often produce ntuples (or rootuples) that are large and redundant (contain same info as micro) • Fortran (or root scripts) analyze ntuples (rootuples) • Event Store: • Originally only in Objectivity • Kanga/Root developed for micro format only, to enhance analysis performance, data distribution • micro currently maintained in Objy and Kanga/Root • Computing/Analysis Sites: • Full copy of micro data available at SLAC, IN2P3, RAL • Production reprocessing in place at INFN-Padova • Numerous institutes produce MC at remote sites Tier A Sites J. Walsh, INFN-Pisa

New Analysis Model – Key Points • Data formats: • Mini: new format with detailed detector information available. • New Micro: upgraded form of current micro, allow faster access, customizable content • Tag (nano-dst): no major changes, general cleanup • Note: large RECO and RAW objects no longer written to event store • Skims: • New Micro will make centralized skims more useful and efficient • Large ntuple production: • Hopefully, rendered obsolete, freeing up resources (CPU, disk space and manpower) J. Walsh, INFN-Pisa

Mini Format • New format introduced over the last year: • contains essentially full detector information • track hits, calorimeter crystals, DRC hits, etc. • efficiently packed to optimize space • about 8 kBytes per event (compare to micro: 2 kBytes/event) • increased analysis capability w.r.t. micro, e.g.: • track extrapolation through detector material • follow changing conditions (e.g. SVT alignment) • event display • etc. J. Walsh, INFN-Pisa

Mini Format - II • Additional characteristics: • customization • larger and slower than micro-dst: • develop coherent staging system to retrieve events from tape. Target access times: • < 1 hr for small (< 100 events) samples • < 1 day for medium (< 1 k events) samples • < 2-4 weeks for large samples • exact use pattern of mini not really predictable • need to remain flexible on implementation J. Walsh, INFN-Pisa

New-Micro Format • Radical improvement w.r.t. current micro-dst • Dual usage: • regular framework/Beta job (current Babar norm) • interactive use with Root • Customizable content • option to store detector info or not • additional user info can be added • composite candidates • different mass hypo track fits, etc. • High speed: the aim is to reach 1 kHz with framework/Beta  will require Beta development • current rate: few tens of Hz • higher read rates envisioned with Root access (at cost of reduced functionality) J. Walsh, INFN-Pisa

New Micro Format – II • Impact on users: • much analysis in Babar done at ntuple level • ntuple analysis code will have to be adapted/converted to new-micro (use of paw/Fortran discouraged)  potentially disruptive • Comment on Objectivity: • since interactive Root access is a basic feature of the new-micro  Objectivity event store is not an option • new-micro will be in Root/IO format J. Walsh, INFN-Pisa

Event Store Debate • Current system: hybrid with Objectivity at SLAC and IN2P3 and Kanga/Root at RAL, INFN-Padova • Problems with Objectivity: • lock collisions • Prompt Reconstruction and Simulation Production performance issues • poor record of scaling with luminosity: every jump in data sample has been accompanied by Objy problems • distribution difficulties  getting data samples to Tier C sites • other HEP experiments have dropped Objy as an option • concerns about viability of Objectivity Company • we don’t have source code • how much expertise will be around in 2007? J. Walsh, INFN-Pisa

Event Store Debate - II • Kanga/Root • much easier maintenance • easier to export • smaller event size (although Objy event size is decreasing with deployment of compression and redesign of navigational info) • more efficient CPU usage • becoming HEP standard – easy to attract manpower to support Kanga/Root J. Walsh, INFN-Pisa

Event Store Debate - III • So, why not drop Objy and go with Kanga/Root system? • Cost of migration: effort and disruption: estimates ranged from 1 to 2 years to achieve migration  most in Babar agree a switch that takes more than 2 years is probably not worth doing. • Kanga/Root has some technical issues that need to be addressed: • file server to handle huge number of files • lack of transactions • lack of cross-file associations (e.g. mini-to-micro navigation) • bookkeeping • staging system • Political/human issues. • Note: conditions database implemented in Objectivity • too costly (estimate 2-3 years) to convert to Root-based DB • Babar relationship with Objy will continue in any case Work to address these issues is ongoing (not just in Babar context)  probably no show-stoppers, but it is work. J. Walsh, INFN-Pisa

Event Store Debate - IV • Alternative to Kanga/Root-only system: a hybrid system where: • new-micro in Kanga/Root format only • everything else (event reconstruction, simulation production, mini format) in Objectivity • Hybrid system has advantage of easier, less-disruptive migration, but we still need to support 2 event store technologies • Final decision/recommendation on event store coming soon J. Walsh, INFN-Pisa

Computing/Analysis Sites • The Working Group is just starting on this subject  just present the issues • Role of Tier A sites - large site that reduces significantly computing burden at SLAC • Primarily analysis: IN2P3, RAL • Production: INFN-Padova • Issues: • data replication at Tier A’s • data partitioning at Tier A’s (micro, mini, beam data, MC) • transparent access to data across Tier A’s (BabarGrid) • specialization of Tier A’s: skimming, (re-)processing, etc. • Role of Tier C sites – smaller sites at remote institutes • main contribution so far in MC production (majority of MC events produced away from SLAC) • analysis at Tier C’s has been difficult due to problems with data distribution need to resolve with new Computing Model J. Walsh, INFN-Pisa

Implementation • Mini • Already implemented in Objectivity (minor fixes, improvements ongoing) • Feasibility of Root implementation has been studied  could be ready by early 2003 • New -micro • Dual usage (Beta and Root) prototype implementation has been achieved. Additional development needed: • customization • persistent composites • Beta/Framework optimization J. Walsh, INFN-Pisa

Migration • Essential requirement: minimal disturbance to Babar capability to produce physics results • Mini • currently doing reprocessing in Padova of all data, producing mini format output • the mini is “new” feature, so disruption of ongoing analyses is minimal • New-micro • exploit Babar’s data replication to ease migration • maintain old-micro at SLAC and IN2P3 sites • introduce new-micro at RAL site • users have choice of format during transition period • Dependence on other parts of Computing Model • use of Tier A sites • choice of event store technology, etc. J. Walsh, INFN-Pisa

Summary • Babar is currently updating its computing model, to be able to deal with large increase in data set in the coming years • A new analysis model, based on the new-micro and mini data formats has been proposed and largely agreed to. • the mini will permit more in-depth analysis • the new-micro will eliminate largely wasteful/redundant ntuples • The working group is also considering the future of event store technologies employed in Babar. • Should Objectivity event store be dropped in favor of Root-based technology? • Is Kanga/Root ready to be used as a full-scale event store? • Does a hybrid system do enough to alleviate the problems of Objy? • An important part of the new model will be how best to use remote computing/analysis sites: Tier A and Tier C • Work starting on this subject within the Working Group J. Walsh, INFN-Pisa

the following are backup slides J. Walsh, INFN-Pisa

Skims with New-Micro • The customization features of the new-micro make it an attractive tool to use with Centralized Skims • The idea is that each Analysis Working Group (AWG) will provide the appropriate event selection and contentcustomization to the Central Skim group • Small skims will be encouraged: deep-copies, which provide fast access, will be possible for small skims (< few % selection rate) • In addition to skims, a generic new-micro containing all physics events will be available  important for new analyses • Aiming for increased frequency for Central Skims – every 3 months • feasibility being evaluated J. Walsh, INFN-Pisa

Tag (nano-dst) Format • The Tag format will continue to be maintained • Optimization/cleanup to remove unused or redundant information – should get a factor of 2 size reduction J. Walsh, INFN-Pisa

Deep Copy vs. Pointer Skims deep copy • Deep copy • copy full event to new location • faster read rate • more disk space • Pointer (shallow) copy • write pointer only • slower read rate • less disk space Ev 1 Ev 2 Ev 2 Ev 4 Ev 3 Ev 4 shallow copy Ptr 2 Ptr 4 J. Walsh, INFN-Pisa

Use Cases • Mature analysis (like sin2b) could create a Mini skim of a relatively small number of events and work from that • An analysis with loose skim cuts (2-body charmless) would customize a new micro skim, dropping unneeded info and saving B candidates. Mini could be used near end of analysis when number of events is sufficiently reduced. • A new analysis would use allEvents generic new micro to explore concept and define cuts. Final analysis could require a customized new micro skim or a mini skim (if event sample is small enough). • An AWG could produce a skim that serves many analyses. Specific analyses could make pointer skims of the skim, or deep copy skims if small enough. • etc. J. Walsh, INFN-Pisa

BABAR: il nuovo modello di calcolo

BABAR: il nuovo modello di calcolo

Presentation Transcript

Il modello di Cox