150 likes | 306 Views
SAMGrid. GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London. Tevatron Less data than LHC, but still PBs/experiment and growing Running experiments SAM (Sequential Access to Metadata) Well developed metadata and distributed data replication system
E N D
SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London GridPP11 Liverpool Sept04
Tevatron Less data than LHC, but still PBs/experiment and growing Running experiments SAM (Sequential Access to Metadata) Well developed metadata and distributed data replication system Developed by DØ & FNAL-CD JIM (Job Information and Monitoring) handles job submission and monitoring (all but data handling) SAM + JIM →SAMGrid – computational grid Runjob handles job workflow management Introduction See http://cdinternal.fnal.gov/RUNIIRev2004/runIIMP.asp GridPP11 Liverpool Sept04
SAMGrid Architecture GridPP11 Liverpool Sept04
SAM plots (DØ usage) Up to 200TB/month Over 2 PB in last yr (DØ usage) • CDF usage now similar • have just topped the PB • Active SAM sites • 40 DØ, 26 CDF GridPP11 Liverpool Sept04
SAMGrid-plots JIM: Active execution sites:11DØ, 1 CDF in testing http://samgrid.fnal.gov:8080/ (09/09/04) GridPP11 Liverpool Sept04
SAMGrid plots GridPP11 Liverpool Sept04
DØ – Production - MC • All DØ MC always produced off-site • SAMGrid now default (went into production in mar 04) • Based on request system and jobmanager-mc_runjob • MC software package retrieved via SAM • Currently running at (multiple) sites in Cz, Fr, UK, USA (10 in total + FNAL) • more on way, inc central farm • Average production efficiency ~90% • Average inefficiency due to grid infrastructure ~1-5% • For more details, see • GridPP10 DØ talk by Peter Love • http://www-d0.fnal.gov/computing/grid/deployment-issues.html GridPP11 Liverpool Sept04
DØ – Production - Reprocessing • P14 Autumn 2003 • 25M events in UK • Based around mc_runjob • Distributed computing rather than Grid • UK effort key to project success • P17 Autumn 2004 • x 10 larger, use of db proxy servers • SAMGrid as default • Use LCG resources GridPP11 Liverpool Sept04
DØ – Production - LCG • Increasing effort to ensure SAMGrid / LCG interoperability • MC generated on EDG/LCG and other shared resources (inc Imperial, RAL) “by hand” • Demo of sam_client functionality on LCG at London workshop in Apr • Will use LCG resources p17 data reprocessing All Nikhef MC produced this way GridPP11 Liverpool Sept04
(DØ –) Runjob Runjob CDFRunjob CMSRunjob DØRunjob • mc_runjob currently used by SAMGrid for MC and reprocessing • DØrunjob - the rewrite • Joint CDF, CMS, DØ, FNAL-CD project • Base classes from common Runjob package • DØrunjob available this autumn • Will incorporate Sandbox as a separate module • For details see: http://projects.fnal.gov/runjob/ GridPP11 Liverpool Sept04
CDF – production - I • See Mòrag Burgon-Lyon’s GridPP 10 talk for details • Goal 1: 25% of computing offsite by June 2004 • Done, using DCAF and SAM • DCAF = de-centralised CDF analysis farm, core of 7 sites, more on way • Goal 2: 50% by June 2005, using Grid • Resources being identified / pledged • JIM deployment • Originally planned for Oct 15th • Problematic, look at grid3 as possible alternative GridPP11 Liverpool Sept04
CDF – production - II • Migration of DCAF sites to Condor • Migration to SAM V6 • Switch to new internal dbserve code under test • Roll out to global sites expected soon • FroNTier - new way to serve database contents to remote institutes • Should lower load on central CDF Oracle servers • Studying methods to lower load and avoid fragmentation on remote file servers due to simultaneous network writes GridPP11 Liverpool Sept04
(CDF -) SAMTV • SAM TV used by CDF & DØ to monitor SAM and SAM stations • Currently created from log files • Version in dev created from MIS database, filled by new MIS server GridPP11 Liverpool Sept04
Summary / plans DØ CDF • SAM & SAMGrid critical • GridPP key part of effort • SAMGrid, default for • MC production • Data reprocessing from autumn • Analysis to follow • dØ tools, dØrte, sandboxing • Interoperability • Good progress • 25% of computing off-site • Most with DCAF/SAM • GridPP effort key part of effort • Increase to 50% for June 2005 • More DCAF installations • Encourage user migration UKLight -10Gbit/s - “data –reprocessing” GridPP11 Liverpool Sept04
Backup - I From Peter Love’s GridPP10 talk GridPP11 Liverpool Sept04