220 likes | 334 Views
DØSAR a Regional Grid within DØ. Jae Yu Univ. of Texas, Arlington. THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington. The Problem. High Energy Physics Total expected data size is over 5 PB (5,000 inche stack of 100GB hard drives) for CDF and DØ
E N D
DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington
The Problem • High Energy Physics • Total expected data size is over 5 PB (5,000 inche stack of 100GB hard drives) for CDF and DØ • Detectors are complicated Need many people to construct and make them work • Collaboration is large and scattered all over the world • Allow software development at remote institutions • Optimized resource management, job scheduling, and monitoring tools • Efficient and transparent data delivery and sharing • Use the opportunity of having large data set in furthering grid computing technology • Improve computational capability for education • Improve quality of life
CDF p DØ Tevatron p DØ and CDF at Fermilab Tevatron Chicago • World’s Highest Energy proton-anti-proton collider • Ecm=1.96 TeV (=6.3x10-7J/p 13M Joules on 10-6m2) • Equivalent to the kinetic energy of a 20t truck at a speed 80 mi/hr
DØ Collaboration 650 Collaborators 78 Institutions 18 Countries
Centeralized Deployment Models Started with Lab-centric SAM infrastructure in place, … …transition to hierarchically distributed Model
Central Analysis Center (CAC) Normal Interaction Communication Path Occasional Interaction Communication Path …. RAC RAC ... … IAC IAC IAC IAC …. …. DAS DAS DAS DAS DØ Remote Analysis Model (DØRAM) Fermilab Regional Analysis Centers Institutional Analysis Centers Desktop Analysis Stations
DØ Southern Analysis Region (DØSAR) • One of the regional grids within the DØGrid • Consortium coordinating activities to maximize computing and analysis resources in addition to the whole European efforts • UTA, OU, LTU, LU,SPRACE, Tata,KSU, KU, Rice, UMiss, CSF, UAZ • MC farm clusters – mixture of dedicated and multi-purpose, rack mounted and desktop, 10’s-100’s of CPU’s • http://www-hep.uta.edu/d0-sar/d0-sar.html
KSU OU/LU KU Aachen Bonn Wuppertal UAZ Mainz Ole Miss UTA GridKa (Karlsruhe) LTU Rice Munich Mexico/Brazil DØRAM Implementation UTA is the first US DØRAC DØSAR formed around UTA
UTA – RAC (DPCC) • 84 P4 Xeon 2.4GHz CPU = 202 GHz • 7.5TB of Disk space • 100 P4 Xeon 2.6GHz CPU = 260 GHz • 64TB of Disk space • Total CPU: 462 GHz • Total disk: 73TB • Total Memory: 168Gbyte • Network bandwidth: 68Gb/sec
The tools • Sequential Access via Metadata (SAM) • Data replication and cataloging system • Batch Systems • FBSNG: Fermilab’s own batch system • Condor • Three of the DØSAR farms consists of desktop machines under condor • PBS • Most the dedicated DØSAR farms use this manager • Grid framework: JIM = Job Inventory Management • Provide framework for grid operation Job submission, match making and scheduling • Built upon Condor-G and globus
Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control eworkers Operation of a SAM Station Producers/ /Consumers
Tevatron Grid Framework (JIM) TTU UTA
The tools cnt’d • Local Task managements • DØSAR • Monte Carlo Farm (McFarm) management Cloned to other institutions • Various Monitoring Software • Ganglia resource • McFarmGraph: MC Job status monitoring • McPerM: Farm performance monitor • DØSAR Grid: Submit requests onto a local machine and the requests gets transferred to a submission site and executed at an execution site • DØGrid • Uses mcrun_job request script • More adaptable to a generic cluster
Ganglia Grid Resource Monitoring Operating since Apr. 2003
Job Status Monitoring: McFarmGraph Operating since Sept. 2003
Farm Performance Monitor: McPerM Operating since Sept. 2003 Designed, implemented and improved by UTA Students
DØSAR MC Delivery Stat. (as of May 10, 2004) D0 Grid/Remote Computing April 2004 Joel Snow Langston University
Ded. Clst. Ded. Clst. Desktop. Clst. Desktop. Clst. Client Site Sub. Sites SAM How does current Tevatron MC Grid work? Global Grid Exe. Sites Regional Grids
Summary and Plans • Significant progress has been made in implementing grid computing technologies for DØ experiment • DØSAR Grid has been operating since April, 2004 • Large amount of documents and expertise accumulated • Moving toward data re-processing and analysis • First set of 180million event partial reprocessing completed • Different level of complexity • Improved infrastructure necessary, especially network bandwidths • LEARN will boost the stature of Texas in HEP grid computing world • Started working with AMPATH, Oklahoma, Louisiana, Brazilian Consortia (Tentatively named the BOLT Network) Need the Texan consortium • UTA’s experience on DØSARGrid will be an important asset to expeditious implementation of THEGrid