80 likes | 225 Views
R-GMA and DØ. Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef. Background. DØ uses SAM as its Datagrid ( http://projects.fnal.gov/samgrid/ ) All official MC production carried out off-site I.e. not at FNAL Store in SAM
E N D
R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef - Iain Bertram
Background • DØ uses SAM as its Datagrid • (http://projects.fnal.gov/samgrid/) • All official MC production carried out off-site • I.e. not at FNAL • Store in SAM • Carried out significant fraction of data reprocessing off-site • Access and store data in SAM - Iain Bertram
DØ and EDG/LCG • Nikhef group have implemented submission of DØ jobs on LCG • MC production • Data reconstruction • Notes from Jeff Templon. • caveat: Jeff is the expert. I am not! Therefore I may have trouble answering questions (my technical experts are at the 4 corners of the globe…). - Iain Bertram
Monitoring using RGMA • From within python script: • worker_node = socket.getfqdn()site = worker_node[string.find(worker_node,'.')+1:]jstabl.set_val('site',site)jstabl.set_val('start_time',start_time)cmdline = string.join(sys.argv)jstabl.set_val('command',cmdline)jstabl.insert() • Under the hood: R-GMA (EDG product) • Can easily replace as long as don’t require more than “set_val” and “insert” … R-GMA has SQL like structure - Iain Bertram
J. Templon Comments • It was useful not to worry about details of where servers, youCommands such as • "DEFINE TABLE" and "INSERT" or "LATEST SELECT". • R-GMA looked like a giant distributed database. • The SQL model worked well for what we wanted to do. • The down side is that the archiver process is not ready for prime time. • It never stays up for more than a few days at a time, and it often dies in a way that fools the babysitting script into thinking that it is still alive. • This of course is deadly. • (the thing that sucks in the published records from jobs, and puts them in a database) - Iain Bertram
LCG/EDG Problems • Single Storage Machine => bottleneck • “WP5” SEs • Traffic Jams • R-GMA not really stable until end December • Couldn’t submit jobs • Missed monitoring records • Software distribution reliable but extremely inefficient • Poor submission command throughput - Iain Bertram
Plans • All MC and data production will be running on SAM computational grid by summer • MC by June 1 • Data reprocessing scheduled for later this year. • FNAL DØ farm will move to SAM-grid. • Plan to support interfaces to LCG for this processing • Runjob will interface directly to LCG - Iain Bertram
Needs • Database Proxy Servers • Need to access trigger/calibration issues • Oracle database • The DB proxy design is in principle generic being based on CORBA (Common Request Broker Architecture) which wraps the sql queries. A two-stage cache is used: RAM and disk space of which the size is configurable, e.g. the cache sizes we currently have configured are in the order of a couple GBs. • Interface between SE and SAM? • Can store our files directly to SAM from LCG site - Iain Bertram