140 likes | 156 Views
Status of CMS-HI Compute Proposal for USDOE. Charles F. Maguire (Vanderbilt University) for the CMS-HI Institutions Version 1, 15:10 CDT on July 1. HI Institutions in CMS. US (all U.S.D.O.E-NP except UC Davis which is NSF):
E N D
Status of CMS-HI Compute Proposal for USDOE Charles F. Maguire (Vanderbilt University) for the CMS-HI Institutions Version 1, 15:10 CDT on July 1 CMS CRB Meeting
HI Institutions in CMS • US (all U.S.D.O.E-NP except UC Davis which is NSF): • Colorado, Iowa, Kansas, LANL, Maryland, Minnesota, MIT, Vanderbilt, UC Davis, UI Chicago • Nuclear Physics division of DOE separate from HEP • Non-US: • Athens, Auckland, Budapest, CERN, Chongbuk, Cukurova, Korea U, Lisbon, Lyon, Moscow, Mumbai, Seoul, Zagreb, Paris? • Overall ~100 people (60 PhDs, 40 students) CMS CRB Meeting
CMS-HI Status for the US DOE • Certain US groups have been receiving core grant support for CMS-HI simulations, R&D: research scientists, postdocs, students, travel, computing • DOE Office of Nuclear Physics has launched its support for the LHC HI program in 2007 • LHC physics is now part of the long term RHI/NP plan in USBoth the ALICE and the CMS experiments are recognized • US ALICE • Construction of EM Calorimeter for ALICE, ~13 M$ project, CD-1 signed • ~10 institutions interested, including LBL, LLNL,ORNL • ~50-70 people • US CMS • HLT farm, ~2M$ project + ZDC ~0.5 M$ + Computing (to be funded) • ~10 institutions interested, including LANL • ~50 people CMS CRB Meeting
Status of CMS-HI Groups in US • CMS proposal to DOE-NP review in October 2006 • CPUs for online farm in FY08-FY10 (consistent with DAQ/Cittolin plans) ~2M$: • Prototype HLT farm part of CMS DAQ for purchase now • Review request: “do more studies”. Launched a ~1 year physics project to simulate jet+g in HI events plus complete revamp of HI software • Operating funds • Cat-A + travel starting in FY07 and FY08 (16 PhD) • Review called for a formal CMS-HI computing proposal • Review follow-up activity • Response to review recommendations almost concluded • Computing proposal is being completed now • Note: the “turn-on” RHIC>CMS is managed very carefully by the DOE-NP, and not all groups can start now • Managed via renewals and supplements of individual groups • e.g. Maryland, Vanderbilt CMS CRB Meeting
Internal Decisions about Computing • CMS-HI Physics Will Need a Dedicated Compute Center • Different production and analysis schedules from HEP program • Different physics goals • US CMS-HI Institutions Reviewed Compute Bids • Reviews occurred during late 2007 and early 2008 • Used external (RHIC) consultants to comment on 3 bids • Conclusion: Vanderbilt University will be the lead institution on the proposal to DOE • Main computer center to be situated on Vanderbilt Campus • Charlie Maguire as the Principal Investigator • Some fraction of computers and disk space will go to MIT • All other institutions will have excellent network connectionsThis includes the overseas CMS-HI institutions CMS CRB Meeting
Reasons for Vanderbilt Choice • Local group has solid experience in RHI computing • Responsible for simulation of PHENIX since 1992 • Remote nearly-real-time reconstruction of PHENIX data at Vanderbilt in 2007 (30 TBytes input raw data, 20 TBytes output, see slides 7 and 8) • Vanderbilt has strong history in RHI Physics at RHIC • Good physics judgment on priorities and to prepare for data • Vanderbilt group will work together with the university computing staff (10 persons at ACCRE facility) • Large existing computational facility for all of Vanderbilt • Strong interaction with local HEP group in CMS (Paul Sheldon, Will Johns, and Dan Engh) CMS CRB Meeting
ACCRE Compute Facility at Vanderbilt ACCREAdvanced Computing Center for Research and Education www.accre.vanderbilt.edu $8.5 M start-up grant from VU $1.5 M additional NSF funding Currently has ~1500 CPUs, maygrow to ~3000 CPUs in 5 years exclusive of CMS-HI purchases Serves a wide spectrum of university research including especially medical applications CMS CRB Meeting
Real RHI Data Reconstruction at ACCRE PHENIX Raw Data (2007) Reconstruction Project RHIC->Vanderbilt->RCF Near-real time reconstruction effort, ~few days latency from calibration processes at local RHIC buffers 30 TBytes of raw data transferred to ACCRE during 6 weeks (no taping) 20 TBytes of reconstruction output returned to RCF for PHENIX users Highly automated assembly line procedures, used ~100 PERL scripts including web-based flow monitoring PHENIX 2007 data volumes are comparable to what is expected for first year of CMS-HI production Near real-time aspect not a factor for CMS-HI, but tape archiving will be CMS CRB Meeting
Overview of CMS-HI Compute Proposal • Bring RAW data to Vanderbilt, archive to tape • Expecting 300 TBytes of data transfers when nominal luminosity is achieved • Use of CERN Tier0 for immediate calibrations and limited reco • Do real data reconstruction with preparation for two passes per year • MC production at MIT and other places • Distribute AODs to all members of CMS HI group around the world • AODs will be processed at Vanderbilt for most US CMS-HI groupsThe USDOE-NP is highly influenced by the RCF model at RHIC • We will use all the CMS tools: CRAB, PhEdEx, PAT, ... • Prepare by using existing centers this summer to exercise CMSSW software • Build up to about 3000 CPUs (VU+MIT) over ~5 years, starting next year (US FY’09) • The CMS-HI compute center will be part of a largerVanderbilt University computing center (ACCRE) • Many synergies, research projects that can help us at VU (REDDnet) • Possibility for “opportunistic computing” beyond the 3000 CPU allocation CMS CRB Meeting
Input to the Draft Proposal • Proposal is using information about timing and data reconstruction from HLT and g-jet studies (USDOE review “jet challenge”) • Some 3000 CPUs running for 12 months (reco, and real+MC analysis) • Total CPU power = 4.8MSI2K, about 10% of CMS-HEP Tier1+Tier2 power • ~400 TB of disk storage • Tape archive (1.6 PBytes over 5 years) • 10 Gbps network connection to CERN and other places (e.g. MIT) • Good and debugged connections to all CMS HI institutions CMS CRB Meeting
Network Issue to Be ResolvedAction Item from USDOE-NP/Esnet Worshop • USDOE-NP and ESnet Organization Workshop was held in May 2008 • Purpose was to review WAN and LAN projected needs for the next 5 years • Major US nuclear physics experiments were requested to give case studies • RACF (RHIC and ATLAS Computing Facility) for PHENIX and STAR at RHIC • LBNL for US-ALICE • Vanderbilt University for US-CMS-HI • JLAB (Jefferson Laboratory) for CEBAF • Action Item Affecting Both US-ALICE and US-CMS-HI • Possibility that the LHCnet capabilities would be saturated by HEP needs • CMS-HI forecasts 300 TBytes to be transported in one month (100% of CMS-HI raw data) • US-ALICE forecasts 100 TBytes to be transported in four months (10% of ALICE raw data) • LHCnet is a US-HEP priority • Related issue of US Congress mandated cap on HEP expenditures for the LHC • What alternative paths exists to LHCnet for trans-Atlantic transport of data? • These alternative links are being investigated by the staff of the US Internet2 organization • The costs and consequences of the alternative links should be well understood by all sides • Both US-ALICE and US-CMS-HI must include such studies in their respective compute proposals CMS CRB Meeting
Computing at Present for CMS-HI • The CMS-HI compute proposal will take additional time and effort to get approved by the USDOE • First draft is almost completed • Discussions have taken place with the FNAL Tier1 experts • Preliminary version is now available to CMS Tier0 experts • Need to get the proposal to the DOE as soon as practical • Proposal will be scrutinized by external DOE reviewers • Earliest funding would be after FY’09 starts (October 2008) • In the interim MIT continues computing for CMS-HI • The HI-Tier-3 is living in symbiosis with HEP-Tier-2 • There are ~130 CPU, 30 TB of disk dedicated to HI, 1GB/core, to be upgraded to 2GB/core shortly • There is the possibility of opportunistic access to >1600 CPUs of the Tier-2 CMS CRB Meeting
CERN Tier0 Moscow Budapest MIT Seoul Vanderbilt Paris… Others? Future Computing for CMS-HIDiscussions at CMS-HI June Meeting CMS CRB Meeting
Summary • A Dedicated CMS-HI Compute Facility is to be Proposed in the US • Facility at Vanderbilt will function as a combined Tier1/Tier2 • Receipt and archiving to tape of the raw data from Tier0, ~300 TBytes/year • Reconstruction of raw data into RECO and AOD files • Processing of AOD files for analysis for US (and others) users • Distribution of some AOD files to overseas CMS-HI facilities • Approximate 25% of the CPUs and disk resources to be put at MIT • Continues MIT’s role as MC producer for CMS-HI • Retains and expands expertise of RHI group at MIT in CMSSW • Proposal Time Scale (www.hep.vanderbilt.edu/~maguirc/CMS-HI/cmsHIComputingProposal.pdf) • First draft should be completed within 2 weeks • Distribution to CMS computing experts for their comments • Revised draft should be submitted to the USDOE-NP as soon as practical • Integration of CMS-HI Compute Facility with Rest of CMS Computing • CMS-HI compute facility should function as much as possible like other CMS Tier1 and Tier2 facilities • CMS-HI institutions must take advantage of all of the developments and tools available at the other CMS compute facilities CMS CRB Meeting