1 / 52

MONARC Project Status Report cern.ch/MONARC

MONARC Project Status Report http://www.cern.ch/MONARC Harvey Newman California Institute of Technlogy http://l3www.cern.ch/monarc/hepccc_hbn70700.ppt HEP-CCC Meeting, SLAC July 7, 2000. Four Experiments The Petabyte to Exabyte Challenge.

nakia
Download Presentation

MONARC Project Status Report cern.ch/MONARC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MONARC Project Status Report • http://www.cern.ch/MONARC • Harvey NewmanCalifornia Institute of Technlogy • http://l3www.cern.ch/monarc/hepccc_hbn70700.ppt • HEP-CCC Meeting, SLAC July 7, 2000 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  2. Four Experiments The Petabyte to Exabyte Challenge • ATLAS, CMS, ALICE, LHCB • Higgs and New particles; Quark-Gluon Plasma; CP Violation • Data written to “tape” ~5 Petabytes/Year and UP (1 PB = 1015 Bytes) • 0.1 to 1 Exabyte (1 EB = 1018 Bytes) (~2010) (~2020 ?) Total for the LHC Experiments July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  3. LHC Computing ChallengesDifferent from Previous Generations • Geographical dispersion: of people and resources • Complexity: the detector and the LHC environment • Scale: Petabytes per year of data ~5000 Physicists 250 Institutes ~50 Countries • Major challenges associated with: •  Coordinated Use of Distributed Computing Resources •  Remote software development and physics analysis •  Communication and collaboration at a distance • R&D: A New Form of Distributed System: Data-Grid July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  4. 622 Mbits/s FNAL/BNL 130k SI95 110 Tbyte Disk; Robot Univ 1 622 Mbits/s Univ 2 Tier2 Ctr 25k SI95 20 TB Disk Robot N X 622 Mbits/s Optional Air Freight CERN 650k SI95 540 Tbytes Disk; Robot Univ M 622Mbits/s Model Circa 2005 or 2006 622 Mbits/s 622 Mbits/s MONARC: Common Project • Models Of Networked Analysis At Regional Centers • Caltech, CERN, Columbia, FNAL, Heidelberg, • Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Munich, Orsay, Oxford, Tufts • PROJECT GOALS • Develop “Baseline Models” • Specify the main parameters characterizing the Model’s performance: throughputs, latencies • Verify resource requirement baselines: (computing, data handling, networks) • TECHNICAL GOALS • Define the Analysis Process • Define RC Architectures and Services • Provide Guidelines for the final Models • Provide a Simulation Toolset for Further Model studies F148 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  5. To Solve: the LHC “Data Problem” • While the proposed LHC computing and data handling facilities are large by present-day standards, • They will not support FREE access, transport or processing for more than a minute part of the data • Balance between proximity to large computational and data handling facilities, and proximity to end users and more local resources for frequently-accessed datasets • Strategies must be studied (simulated) and prototyped, to ensure:acceptable turnaround times, and efficient resource utilization • Problems to be Explored • How to meet demands of hundreds of users who need transparent access to local and remote data, in disk caches and tape stores • Prioritise hundreds of requests of local and remote communities, consistent with local and regional policies • Ensure that the system is dimensioned/used/managed optimally, for the mixed workload July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  6. MONARC History • Spring 1998 First Distributed Center Models (Bunn; Von Praun) • 6/1998 Presentation to LCB; Project Assignment Plan • Summer 1998 MONARC Project Startup (ATLAS, CMS, LHCb) • 9 - 10/1998 Project Execution Plan; Approved by LCB • 1/1999 First Analysis Process to be Modeled • 2/1999 First Java Based Simulation Models (I. Legrand) • Spring 1999 Java2 Based Simulations; GUI • 4/99; 8/99; 12/99 Regional Centre Representative Meetings • 6/1999 Mid-Project Progress Report Including MONARC Baseline Models • 9/1999 Validation of MONARC Simulation on Testbeds Reports at LCB Workshop (HN, I. Legrand) • 1/2000 Phase 3 Letter of Intent (4 LHC Experiments) • 2/2000 Papers and Presentations at CHEP2000: D385, F148, D127, D235, C113, C169 • 3/2000 Phase 2 Report • Spring 2000 New Tools: SNMP-based Monitoring; S.O.M. • 5/2000 Phase 3 Simulation of ORCA4 Production; Begin Studies with Tapes • Spring 2000 MONARC Model Recognized by Hoffmann WWC Panel; Basis of Data Grid Efforts in US and Europe July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  7. Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS MONARC: Regional Center Hierarchy (Worldwide Data Grid) GriPhyN: FOCUS On University Based Tier2 Centers Experiment ~PBytes/sec Online System ~100 MBytes/sec Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size Offline Farm,CERN Computer Center > 20 TIPS Tier 0 +1 ~0.6 - 2.5 Gbits/sec + Air Freight HPSS FNAL Center Italy Center UK Center FranceCentre Tier 1 ~2.4 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Institute ~0.25TIPS Institute Institute Institute 100 - 1000 Mbits/sec Physics data cache Tier 4 Workstations July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  8. MONARC General Conclusions on LHC Computing • Following discussions of computing and network requirements, technology evolution and projected costs, support requirements etc. • The scale of LHC “Computing” requires a worldwide effort to accumulate the necessary technical and financial resources • A Regional Centre Hierarchy of distributed computing centres will lead to better use of the financial and manpower resources of CERN, the Collaborations, and the nations involved, than a highly centralized model focused at CERN • The distributed model also provides better use of physics opportunities at the LHC by physicists and students • At the top of the hierarchy is the CERN Center, with the ability to perform all analysis-related functions, but not the ability to do them completely • At the next step in the hierarchy is a collection of large, multi-service “Tier1 Regional Centres”, each with • ~20% of the CERN capacity devoted to one experiment • There will be Tier2 or smaller special purpose centers in many regions July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  9. MONARC Key Featuresfor a Successful Project • The broad based nature of the collaboration: LHC experiments, regional representatives, covering different local conditions and a range of estimated financial means • The choice of the process-oriented discrete event simulation approach backed up by testbeds, allowing to simulate accurately • a complex set of networked Tier0/Tier1/Tier2 Centres • the analysis process: a dynamic workload of reconstruction and analysis jobs submitted to job schedulers, and then to multi-tasking compute and data servers • the behavior of key elements of the system, such as distributed database servers and networks • The design of the simulation system, with an appropriate level of abstraction, allowing it to be CPU and memory-efficient • The use of prototyping on the testbeds to ensure the simulation is capable of providing accurate results • Organization into four technical working groups • Incorporation of the Regional Centres Committee July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  10. MONARC Working Groups/Chairs • Analysis Process Design WGP. Capiluppi (Bologna, CMS)Studied the analysis workload, job mix and profiles, time to complete the reco. and analysis jobs. Worked with the Simulation WG to verify that the specified resources in the models could handle the workload. • Architectures WGJoel Butler (FNAL, CMS)Studied the site and network architectures, operational modes and services provided by Regional Centres, data volumes stored and analyzed, candidate architectures for CERN, Tier1 (and Tier2) Centres • Simulation WGK. Sliwa (Tufts, ATLAS)Defined the methodology, then (I. Legrand) designed, built and further developed the simulation system as a toolset for users. Validated the simulation with the Testbeds group. • Testbeds WGL. Luminari (Rome, ATLAS) Set up small and larger prototype systems at CERN, several INFN and US sites and Japan, and used them to characterize the performance of the main elements that could limit throughput in the simulated systems • Steering GroupLaura Perini (Milan, ATLAS) • Harvey Newman (Caltech, CMS) • Regional Centres Committee July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  11. Architectural Sketch: One Major LHC Experiment, At CERN (L. Robertson) • Mass Market Commodity PC Farms • LAN-SAN and LAN-WAN “Stars” (Switch/Routers) • Tapes (Many Drives for ALICE); an archival medium only ? July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  12. Network from CERN Network from Tier 2 & simulation centers Tapes Regional Center Architecture Example by I. Gaines Tape Mass Storage & Disk Servers Database Servers Tier 2 Local institutes Data Import Data Export Production Reconstruction Raw/Sim  ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD  AOD AOD  DPD Scheduled Physics groups Individual Analysis AOD  DPD and plots Chaotic Physicists CERN Tapes Desktops C169 Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  13. MONARC Architectures WG: Regional Centre Services • Regional Centres Should Provide • All technical and data services required to do physics analysis • All Physics Objects, Tags and Calibration data • Significant fraction of raw data • Caching or mirroring calibration constants • Excellent network connectivity to CERN and the region’s users • Manpower to share in the development of common validation and production software • A fair share of post- and re-reconstruction processing • Manpower to share in ongoing work on Common R&D Projects • Excellent support services for training, documentation, troubleshooting at the Centre or remote sites served by it • Service to members of other regions • Long Term Commitment: staffing, hardware evolution, and support • Highlights the “Tier0+Tier1” Requirements at CERN July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  14. MONARC and Regional Centres • MONARC RC FORUM: Representative Meetings ~Quarterly • Regional Centre Planning advancing, with optimistic outlook, in US, France, Italy, UK, Pakistan • Proposals submitted in late 1999 - 2001 • Active R&D and prototyping underway, especially in US, Italy, Japan, UK, Russia (MSU, ITEP), Finland (HIP) • Discussions in the national communities also underway in China, Germany • There is a near-term need to understand the level and sharing of support for LHC computing between CERN and the outside institutes, to enable the planning in several countries to advance. MONARC Used traditional 1/3:2/3 sharing assumptionAdopted as a guideline in the Hoffmann Review July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  15. MONARC Analysis Model • Hierarchy of Processes (Experiment, Analysis Groups, Individuals) 1000 SI95sec/event 1 job year 1000 SI95sec/event 3 jobs per year Experiment- Wide Activity (109 events) Reconstruction Re-processing 3 per year New detector calibrations Or understanding 5000 SI95sec/event 25 SI95sec/event ~20 jobs per month Monte Carlo Trigger based and Physics based refinements Iterative selection Once per month Selection ~20 Groups’ Activity (109  107 events) 3SI95sec/event ~2000 jobs per day ~25 Individual per Group Activity (106 –108 events) Analysis Different Physics cuts & MC comparison ~4 times per day Algorithms applied to data to get results July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  16. CMS CPU estimates (1) • Reconstruction • 100 days of run for a total of 109 events • 1000 SI95sec/event • 100 days response time (“quasi” on-line) • [(109 events) x (1000 SI95sec/event)] / [(100 days) x (8.6 104 sec/day)] = 116 kSI95 / efficiency(80%) = 145 kSI95 • Re-processing • 3 times per year • 2 months response time per re-processing • 109 events to re-process • 1000 SI95sec/event • [(109 events) x (1000 SI95sec/event)] / [(60 days) x (8.6 104 sec/day)] = 190 kSI95 / efficiency(80%) = 240 kSI95 • Re-definition of AOD and TAG • Once per month • 10 days response time • 0.25 SI95sec/event • [(109 events) x (0.25 SI95sec/event)] / [(10 days) x (8.6 104 sec/day)] 1 kSI95 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  17. CMS CPU estimates (2) • Selection • Once per month per Analysis Group (20 Groups) • Response time 1 day • 107 events selected • 25 SI95sec/event • [(107 events) x (25 SI95sec/event)] / [(1 day) x (8.6 104 sec/day)] =3 kSI95 x efficiency(80%) = 4 kSI95 • Analysis (accessing only AOD, TAG and PDP) • 4 times per day per individual • Response time 4 hours • 107 events analyzed per job • 3 SI95sec/event • 500 concurrent users (20 Groups, each of 25 individuals) • [(107 events) x (3 SI95sec/event) x (500 jobs)] / [(4 hours) x (3.6 103 sec/hour) x (1 job)] = 1050 kSI95 x efficiency(80%) = 1300 kSI95 • Analysis (adds when also accessing ESD) • Same as above but • 105 events analyzed per job • 15 SI95sec/event • [(105 events) x (15 SI95sec/event) x (500 jobs)] / [(4 hours) x (3.6 103 sec/hour) x (1 job)] = 52 kSI95 x efficiency(80%) = 65 kSI95 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  18. CMS CPU Estimates: Summary • Total CPU for CMS • Reconstruction 145 kSI95 • Re-processing 240 kSI95 • Re-definition of AOD and TAG 1 kSI95 • Selection 4 kSI95 • Analysis (AOD, TAG, DPD) 1300 kSI95 • Analysis (ESD) 65 kSI95 • Simulation (minimum to be guaranteed) 90 kSI95 1800 kSI95 • Required at CERN (Tier0 and Tier1 functionalities) • Reconstruction 145 kSI95 • Re-processing 240 kSI95 • Selection 4 kSI95 • Analysis (20%) 260 kSI95 650 kSI95 Note: Reconstruction is presently ~3000 SI95.sec/event and Analysis is as high as 18 SI95.sec/event July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  19. Paper-values of CPU needs(CMS baseline Analysis Process) • (80% efficiency, • no AMS overhead) • Activity • Frequency • Response • time/pass • Total CPU • Power (SI95) • Total Disk • (TB) • Disk I/O • (MB/s) • Reconstruction • Experiment • Once/Year • 100 Days • 145k • 250 • 500 • Simulation + • Reconstruction • Experiment /Group • ~1M events/Day • ~300 Days • 100k • 200 • 10 • Re-processing • Experiment • 3 times/Year • 2 Month • 240k • 250 • 300 • Re-definition • (AOD &TAG) • Experiment • Once/Month • 10 Days • 1k • 15 • 12000 • Selection • Groups (20) • Once/Month • 1 Day • 4k • 50 • 1200 • Analysis • (AOD, TAG & DPD) • Individuals • (500) • 4 Times/Day • 4 Hours • 1300k • ? • 7500 • Analysis (ESD 1%) • Individuals • (500) • 4 Times/Day • 4 Hours • 65k • ? • Total installed • ~1800k • ~750 +x July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  20. MONARC Testbeds WG: Isolation of Key Parameters • Some Parameters Measured,Installed in the MONARC Simulation Models,and Used in First Round Validation of Models. • Objectivity AMS Response Time-Function, and its dependence on • Object clustering, page-size, data class-hierarchy and access pattern • Mirroring and caching (e.g. with the Objectivity DRO option) • Scalability of the System Under “Stress”: • Performance as a function of the number of jobs, relative to the single-job performance • Performance and Bottlenecks for a variety of data access patterns • Tests over LANs and WANs D235, D127 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  21. Requirements Outlook • Some significant aspects of LHC Computing need further consideration • Efficiency of use of CPU with a real workload • Pressure to store more data • More data per ESD • Higher DAQ recording rate (ATLAS ?) • Simulated data: produced at many remote sites; eventually stored and accessed at CERN • Tendency towards greater CPU • ~3000 SI95-sec to fully reconstruct (CMS ORCA Production) • To 18 SI95-sec to analyze • B Physics: Samples of 1 to Several X 108 Events; MONARC CMS/ATLAS Studies assume typically 107 (aimed at high pT physics) July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  22. MONARC Testbeds WG • Test-bed configuration defined and widely deployed • “Use Case” Applications Using Objectivity: • GIOD/JavaCMS, CMS Test Beams, ATLASFAST++, ATLAS 1 TB Milestone • Both LAN and WAN tests • ORCA4 (CMS) • First “Production” application • Realistic data access patterns • Disk/HPSS • “Validation” Milestone Carried Out, with Simulation WG A108 C113 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  23. MONARC Testbed Systems July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  24. Design Considerations of the MONARC Simulation System F148 • This simulation project is based on Java2(TM) technology which provides adequate tools for developing a flexible and distributed process oriented simulation. Java has built-in multi-thread support for concurrent processing, which can be used for simulation purposes by providing a dedicated scheduling mechanism. • The distributed objects support (through RMI or CORBA) can be used on distributed simulations, or for an environment in which parts of the system are simulated and interfaced through such a mechanism with other parts which actually are running the real application. A PROCESS ORIENTED APPROACH for discrete event simulation is well-suited to describe concurrent running tasks • “Active objects”(having an execution thread, a program counter, stack...) provide an easy way to map the structure of a set of distributed running programs into the simulation environment. July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  25. It provides: An efficient mechanism to simulate multitask processing An easy way to apply different load balancing schemes Multitasking Processing Model • Assign active tasks (CPU, I/O, network) to Java threads • Concurrent running tasks share resources (CPU, memory, I/O) “Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “times to completion” are recomputed. July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  26. MONARC SimulationData Model It provides • Realistic mapping for an object data base • Specific HEP data structure • Transparent access to any data • Automatic storage management • An efficient way to handle very large number of objects. • Emulation of clustering factors for different types of access patterns. • Handling related objects in different data bases. July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  27. Example : Physics Analysis at Regional Centres • Similar data processing jobs are performed in each of several RCs • There is profile of jobs,each submitted to a job scheduler • Each Centre has “TAG”and “AOD” databases replicated. • Main Centre provides “ESD” and “RAW” data • Each job processes AOD data, and also aa fraction of ESD and RAW data. July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  28. Example: Physics Analysis July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  29. (1) ... I/O I/O I/O I/O I/O I/O CPU CPU CPU CPU CPU CPU (2) ... (3) jobs jobs jobs Raw Data Raw Data Raw Data Simulation Validation: LAN Measurements (Y. Morita et al) • Machine A: • Sun Enterprise450 (400MHz 4x CPU) • Machine B: • Sun Ultra5 (270MHz): The Lock Server • Machine C: • Sun Enterprise 450 (300MHz 2x CPU) • Tests: • (1) Machine A local (2 CPU) • (2) Machine C local (4 CPU) • (3) Machine A (client) and Machine C (server) • number of client processes: 1, 2, 4, ..., 32 C113 Event Time Job on Machine A CPU 17.4 SI95 I/O 207MB/s @ 54MB file CPU 14.0 SI95 I/O 31MB/s @ 54MB file Job on Machine C July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  30. 4200 4100 4000 0 1 2 Modeling an AMS Across LANs and WANs • AMS page transfer latency is • modeled into the simulation • Physical Bandwidth: B • Effective Bandwidth: Beff • T = t (transfer) + t (handshake) • = unit_size / B + RTT • Beff unit_size • ------ = ----------------------------- • B unit_size + B * RTT AMS Packet Sequence D235 x 103 IP Packet Number Write Read Study by H. Sato and Y. Morita on CERN-KEK 2 Mbps Satellite Link CHEP2000 Paper D235 Time since Packet No. 4M is sent (sec) July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  31. AMS Client (4cpu) Node_link_speed AMS Server DB Server CPU Node 1 CPU Node 2 CPU Node 3 CPU Node 4 DB_link_speed Network DB_read_speed Other Modeling Details • Client SMP architecture is modeled as multiple CPUs as above with high speed network connections July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  32. Validation Measurements The AMS Data Across a LAN Distribution of 32 Jobs’ Processing Time Simulation Measurements C113 4 CPUs Client LAN Raw Data DB Simulation mean 109.5 Measurement mean 114.3 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  33. Queueing theoryM | M | 1 Network Queue Model July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  34. Resource Utilisation vs. Job’s Response Time Physics Analysis Example 180 CPUs 200 CPUs 250 CPUs Mean 0.55 Mean 0.72 Mean 0.93 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  35. RMI Server Write Objects afs/nfs file system Web Server Results repository and the “publishing” procedure The Simulation GUIs and the “publishing” procedure July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  36. Signal Zebra files with HITS HEPEVT ntuples CMSIM MB Catalog import Objectivity Database ORCA Digitization (merge signal and MB) Objectivity Database ORCA ooHit Formatter Catalog import HLT Algorithms New Reconstructed Objects Objectivity Database Objectivity Database Objectivity Database ytivitcejbO esabataD HLT Grp Databases ORCA Production June 2000 MC Prod. ORCA Prod. Mirrored DB’s (US) July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  37. ORCA and CMS HLT MilestonesSolving The Pileup Problem • 17 minimum bias events/crossing (Poisson distributed) • Calorimeter needs -5  +3 crossings • (Muon would need -15  +15 to study full influences, but not in the current program of work) • Therefore 200 Min-bias events needed for each signal event • 1 minimum bias event in CMSIM = 350 kb(1/3 Calo Hits, 1/3 Tracker, 1/3 Kine) • Pre-digitization phase to transfer CMSIM into the database • Separate containers for Header, Kine, EBRY, EFRY, ESFX, HCAL, MB, MF, RPC, TK • Cluster these containers according to their expected access patterns in DB Files: Event headers + ptrs., Calo. Hits, TrkHits, MC Generator information • 2 x 100k pileup events stored in Objectivity • CARF picks randomly from the entire dataset to build a “crossing” of pileup events convoluted with the signal event July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  38. Signal DB Signal DB Signal DB ... 6 Servers for Signal 2 Objectivity Federations ORCA Production on CERN/IT-LoanedEvent Filter Farm Test Facility HPSS Total 24 Pile Up Servers Lock Server Pileup DB Pileup DB Pileup DB Pileup DB Lock Server 17 Servers SUN Pileup DB Output Server Pileup DB 9 Servers Output Server Pileup DB ... FARM 140 Processing Nodes The strategy is to use many commodity PCs as Database Servers July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  39. This is Hard ! • Compare to Run II • One CMS event 4 times bigger than 1 Run II event • This is a measure of detector and event complexity • Pileup/crossing at 1034 is 3-20 times more than at Run II • Number of crossings to take into account is 4-10 times bigger than at RUN II • Combined factor is 50-400 times more computing required for CMS MC Events than Run II • 2 million CMS events equivalent computing to >200 million Run II events July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  40. CMS Objectivity AMS Service for HLT/2000 IO Activity on Sun Datastore 24 Linux Servers for pileup 140 Batch Linux CPU’s 4 Linux Servers for federations and Journals 6 LinuxServers for 2TB ooHits July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  41. Performance • More like an Analysis facility than a DAQ facility • 140 jobs reading asynchronously and chaotically from 30 AMS server’s, writing to a high speed SUN server • Non disk-resident data being staged from tape • 70 jetmet jobs at ~60 seconds/event and 35MB/event • 70 muon jobs at ~90 seconds/event and 38MB/event • Best Reading rate out of Objectivity ~ 70MB/sec • Continuous 50MB/sec reading rate • 1 Million JetMET events took ~ 10 days • 1 Million Inclusive Muon events took ~15 days }in parallel July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  42. Simulation of the ORCA4 Production Farm Configuration • Distribute Pileup over 24 Linux servers • Other data over 6 Linux servers (70GB disk each) • 2 Linux stations used for federations (Metadata, catalog etc) • 2 Linux stations used for Journal files (used in locking) • shift20 (SUN) Used for lockserving and Output (2 x ~250GB disks) Strategy: use many commodity PCs as Database Servers One Result: better to use small and medium-sized Servers July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  43. Muon <0.90> Jet <0.52> Network Traffic & Job efficiency Measurement Mean measured Value ~48MB/s Simulation July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  44. Total Time for Jet & Muon Production Jobs July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  45. SNMP Based Monitoring Toolfor Site and Network Activities By I. Legrand, Caltech Total IP traffic in the CERN domain CPU usage & I/O per cluster July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  46. Self-Organizing Map for Scheduling: A Toy Example by I. Legrand Assume time to execute a job in the local farm having a certain load (a) is: tL = t0(1+f(a)) Where t0 is the theoretical time to perform the job and f(a) describes the effect of the farm load in the job execution time. If the job is executed on a remote site, an extra factor (b > 1) is introduced reducing the response time: tR = t0(1+bf(a)) July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  47. MONARC Phase 1 and 2 Culmination [*] • MONARC has successfully met its major milestones, and has fulfilled its basic goals, including • Identifying first-round baseline Computing Models that could provide viable (and cost-effective) solutions to meet the simulation, reconstruction and analysis needs of the LHC experiments • Providing a powerful (CPU and time-efficient) simulation toolset that will enable further studies and optimisation of the Models • Providing guidelines for the configuration and services of Regional Centres • Providing an effective forum where representatives of actual and candidate Regional Centres may meet and develop common strategies for LHC Computing. • [*] See MONARC Phase 2 Report, CERN/LCB 2000-1 July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  48. MONARC Status • MONARC is on the way to specifying baseline Models representing cost-effective solutions to LHC Computing • Studies have shown that LHC computing has a new scale and level of complexity, implying the need for novel solutions • MONARC’s Regional Centre hierarchy model has been accepted by all four LHC Experiments, and is the basis of HEP Data Grid work. • A powerful simulation system has been developed, and is avery useful toolset for further model studies. • Synergy with other advanced R&D projects has been identified:PPDG, GriPhyN, EU HEP Data Grid • Example Computing Models have been provided, which is important input for the Hoffmann Review on LHC Computing • MONARC Phase 3 has been Proposed • Based on prototypes, with increasing detail and realism • Coupled to Experiments’ “Data Challenges” from 2000 on • The first realistic simulation, of the Spring 2000 HLT Data Challenge, has been completed July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  49. MONARC Phase 3 • INVOLVING CMS, ATLAS, LHCb, ALICE • TIMELY and USEFUL IMPACT: • Facilitate the efficient planning and design of mutually compatible site and network architectures, and services • Among the experiments, CERN Centre and the Regional Centres • Provide modelling consultancy to the experiments and Centres • Provide a core of advanced R&D activities, aimed at LHC computing system optimisation and production prototyping • Take advantage of work on distributed data-intensive computing for HENP this year in other “next generation” projects • For example PPDG • Base developments on large scale testbed prototypesat every stage (ORCA4 example) July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

  50. Applns A Rich Set of HEP Data-Analysis Related Applications Appln Toolkits Remote data toolkit Remote comp. toolkit Remote viz toolkit Remote collab. toolkit Remote sensors toolkit ... Grid Services Protocols, authentication, policy, resource management, instrumentation, resource discovery,etc. Grid Fabric Data stores, networks, computers, display devices,… ; associated local services MONARC Future and Grid Architecture [*] [*] Adapted from Ian Foster: there are Computing Grids, Access (collaborative) grids, Data grids, ... July 7, 2000 MONARC Project Status Report to HEP-CCC Harvey B Newman (CIT)

More Related