The Texas High Energy Grid (THEGrid)

The Texas High Energy Grid (THEGrid) A Proposal to Build a Cooperative Data and Computing Grid for High Energy Physics and Astrophysics in Texas Alan Sill, Texas Tech University Jae Yu, Univ. of Texas, Arlington Representing HiPCAT and the members of this workshop

Outline • High Energy Physics and Astrophysics in Texas • Work up to this workshop • CDF, DØ, ATLAS, CMS experiments • Problems • A solution • Implementation of the solution • DØ and CDF Grid status • ATLAS, CMS • Etc. • Status • Summary and Plans

High Energy Physics in Texas • Several Universities • UT, UH, Rice, TTU, TAMU, UTA, UTB, UTEP, SMU, UTD, etc • Many different research facilities used • Fermi National Accelerator Laboratory • CERN, Switzerland, DESY, Germany, and KEK, Japan • Jefferson Lab • Brookhaven National Lab • SLAC, CA and Cornell • Natural sources and underground labs • Sizable community, variety of experiments and needs • Very large data sets now! Even larger ones coming!!

The Problem • High Energy Physics and Astrophysics data sets are huge • Total expected data size is over 5 PB for CDF and DØ, 10x larger for CERN experiments • Detectors are complicated, need many people to construct and make them work • Software is equally complicated • Collaboration is large and scattered all over the world • Solution: Use the opportunity of having large data set in furthering grid computing technology • Allow software development and use at remote institutions • Optimize resource management, job scheduling, monitoring tools, use of resources • Efficient and transparent data delivery and sharing • Improve computational capability for education • Improve quality of life for researchers and students

Work up to this point • HiPCAT: • What is HiPCAT? High Performance Computing Across Texas - a network and organization of computing centers and their directors at many Texas universities • Other projects (TIGRE, cooperative education, etc.) • Natural forum for this proposal • First presentation April 2003 • Many discussions since then • Led to this workshop

CDF p Dzero Tevatron p DØ and CDF at Fermilab Tevatron Chicago  • World’s Highest Energy proton-anti-proton collider • Ecm=1.96 TeV (=6.3x10-7J/p 13M Joules on 10-6m2) • Equivalent to the kinetic energy of a 20t truck at a speed 80 mi/hr Currently generating data at over a petabyte per year

User Desktops Robotic Tape Storage Data Analysis 7MHz beam Xing CDF 20 MB/s Read/write Data 75 Hz Simulation and Reconstruction Central Analysis Farm (CAF) (~500 duals) 0.75 Million channels L1 ↓ L2 ↓ 300 Hz Production Farm (~150 duals) Level 3 Trigger (~250 duals) Large-scale cluster computing duplicated all over the world CDF Data Analysis Flow: Distributed clusters in Italy, Germany, Japan, Taiwan, Spain, Korea, several places in the US, the UK, and Canada (more coming).

TTU UTA Tevatron Current Grid Framework (JIM)

CDF-GRID: Example of a working practical grid • CDF-GRID based on DCAF clusters is a de-facto working high energy physics distributed computing environment • Built / developed to be clonable, • Deployment led by TTU • Large effort on tools usable both on- and off-site • Data access (SAM, dCache) • Remote / multi-level DB servers • Store from remote sites to tape/disk at FNAL • User MC jobs at remote sites = reality now • Analysis on remote data samples being developed using SAM • Up and working, already used for physics ! • Many pieces borrowed from / developed with / shared with Dzero • This effort is making HEP remote analysis possible -> practical -> working -> easy for physicists to adopt

Basic tools • Sequential Access via Metadata (SAM) • Data replication and cataloging system • Batch Systems • FBSNG: Fermilab’s own batch system • Condor • Three of the DØSAR farms consists of desktop machines under Condor • CDF: Most central resources already based on Condor • PBS • More general than FBSNG; most dedicated DØSAR farms use this manager • Part of popular Rocks cluster configuration environment • Grid framework: JIM = Job Inventory Management • Provide framework for grid operation  Job submission, match making and scheduling • Built upon Condor-G and Globus • MonALISA, Ganglia, user monitoring tools • Everyone has an account (with suitable controls), so everyone can submit!

Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control eworkers Data Handling: Operation of a SAM Station Producers/ /Consumers

The tools cont’d • Local Task management • CDF Grid (http://cdfkits.fnal.gov/grid/) • Decentralized CDF Analysis Farm = DCAF • Develop code anywhere (laptop is supported) • Submit to FNAL or TTU or CNAF or Taiwan or SanDiego or… • Get output ~everywhere (most desktops OK) • User monitoring system including Ganglia, info by queue/user per cluster • DØSAR (Dzero Southern Analysis Region) • Monte Carlo Farm (McFarm) management (cloned to other institutions) • DØSAR Grid: Submit requests onto a local machine and the requests gets transferred to a submission site and executed at an execution site • Various Monitoring Software • Ganglia resource • McFarmGraph: MC Job status monitoring • McPerM: Farm performance monitor

Background Statistics on CDF Grid • Data acquisition and data logging rate increased • More data = more physicists • Approved by FNAL’s Physics Advisor Committee and Director • Computing needs grow, but DOE/Fnal-CD budget flat • CDF proposal: do 50% of analysis work offsite • CDF-GRID: Planned at Fermilab, deployment effort led by TTU • Have a plan on how to do it • Have most tools in place and in use • Already in deployment status at several locations throughout the world

Hardware resources in CDF-GRID

DØSAR MC Delivery Stat. (as of May 10, 2004) D0 Grid/Remote Computing April 2004 Joel Snow Langston University

DØSAR Computing & Human Resources

Current Texas Grid Status • DØSAR-Grid • At the recent workshop in Louisiana Tech Univ. • 6 clusters form a regional computational grid for MC production • Simulated data production on grid in progress • Institutions are paired to bring up new sites quicker • Collaboration between DØSAR consortium and the JIM team at Fermilab begun for further software development • CDF Grid • Less functionality than more ambitious HEP efforts, such as the LHC-Grid, but • Works now! Already in use!! • Deployment led by TTU • Tuned on user’s needs • ObjectGoal Oriented software! • Based on working models and spare use of standards • Costs little to get started • Large amount of documents and expertise in grid computing accumulated between TTU and UTA already • Comparable experience probably available at other Texas institutions

Actual DØ Data Re-processing at UTA

Network Bandwidth Needs

Also have Sloan Digital Sky Survey and other astrophysics work • TTU SDSS DR1 mirror copy (first in the world) • Locally hosted MySQL DB. • Image files stored on university NAS storage • Submitted proposal w/ Astronomy and CS colleagues for nationally-oriented database storage model based on local new observatory. • Virtual Observatory (VO) storage methods -- international standards under development. • Astrophysics is increasingly moving towards Grid methods.

Summary and Plans • Significant progress has been made within Texas in implementing grid computing technologies for current and future HEP experiments • UTA and TTU are playing leading roles in Tevatron grid effort for the currently running DØ and CDF as well as in LHC – ATLAS and CMS experiments • All HEP experiments building operating grids for MC data production • Large amount of documents and expertise exist within Texas! • Already doing MC; moving toward data re-processing and analysis • Different level of complexities can be handled by emerging framework • Improvements to infrastructure necessary, especially with respect to network bandwidths • THEGrid will boost the stature of Texas in HEP grid computing world • Regional plans: Started working with AMPATH, Oklahoma, Louisiana, Brazilian Consortia (tentatively named the BOLT Network) • Need Texas-based consortium to make progress in HEP and astrophysics computing

Summary and Plans cont’d • Many shared pieces with between DØ and CDF experiment for global grid development: Provides a template for THEGrid work • Near-term goals: • Involve other institutions, including those in Texas • Implement and use an analysis grid 4 years before LHC • Work in close relation but not as part of LHC-Grid (so far) • Other experiments will benefit from feedback and use cases • Lead the development of these technologies for HEP • Involve other experiments and disciplines; expand grid • Complete the THEGrid document • THEGrid will provide ample opportunity to increase inter-disciplinary research and education activities

The Texas High Energy Grid (THEGrid)