180 likes | 294 Views
Computing for Hall D. Ian Bird Hall D Collaboration Meeting March 22, 2002. Data Volume per experiment per year (Raw data - in units of 10 9 bytes). But : collaboration sizes!. Technologies. Technologies are advancing rapidly Compute power Storage – tape and disk Networking
E N D
Computing for Hall D Ian Bird Hall D Collaboration Meeting March 22, 2002 Ian.Bird@jlab.org
Data Volume per experiment per year (Raw data - in units of 109 bytes) But: collaboration sizes!
Technologies • Technologies are advancing rapidly • Compute power • Storage – tape and disk • Networking • What will be available 5 years from now? • Difficult to predict – but it will not be a problem to provide any of the resources that Hall D will need…. • E.g computing: Ian.Bird@jlab.org
FY00, 16 duals (2u) + 500 GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” First purchases, 9 duals per 24” rack FY01, 4 CPU per 1u Intel Linux Farm Ian.Bird@jlab.org
Compute power • Blades • Low power chips • Transmeta, Intel • Hundreds in a single rack • “An RLX System 300ex chassis holds twenty-four ServerBlade 800i units in a single 3U chassis. This density achievement packs 336 independent servers into a single 42U rack, delivering 268,800 MHz, over 27 terabytes of disk storage, and a whopping 366 gigabytes of DDR memory. “ Ian.Bird@jlab.org
Technologies • As well as computing, developments in Storage and Networking will also make rapid progress • Grid computing techniques will bring these technologies together • Facilities – new Computer Center planned • Issues will not be technology, but: • How to use them intelligently • Hall D computing model • People • Treating computing seriously enough to assign sufficient resources Ian.Bird@jlab.org
(Data-) Grid Computing Ian.Bird@jlab.org
Particle Physics Data GridCollaboratory Pilot Who we are: Four leading Grid Computer Science Projects and Six international High Energy and Nuclear Physics Collaborations The problem at hand today: Petabytes of storage, Teraops/s of computing Thousands of users, Hundreds of institutions, 10+ years of analysis ahead What we do: Develop and deploy Grid Services for our Experiment Collaborators and Promote and provide common Grid software and standards Ian.Bird@jlab.org
PPDG Experiments ATLAS - aToroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass … http://atlasinfo.cern.ch/Atlas/Welcome.html BaBar - at the Stanford Linear Accelerator Center Running Now Goals: study CP violation and more http://www.slac.stanford.edu/BFROOT/ CMS - the Compact Muon Solenoid detector at CERN Runs 2006 on Goals: TeV physics - the Higgs and the origin of mass … http://cmsinfo.cern.ch/Welcome.html/ D0 – at theD0 colliding beam interaction region at Fermilab Runs Soon Goals: learn more about the top quark, supersymmetry, and the Higgs http://www-d0.fnal.gov/ STAR - Solenoidal Tracker At RHIC at BNL Running Now Goals: quark-gluon plasma … http://www.star.bnl.gov/ Thomas Jefferson National Laboratory Running Now Goals: understanding the nucleus using electron beams … http://www.jlab.org/ Ian.Bird@jlab.org
PPDG Computer Science Groups Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership. http://www.cs.wisc.edu/condor/ Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations http://www.globus.org/ SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems http://gizmo.lbl.gov/DM.html Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets. http://www.npaci.edu/DICE/SRB/index.html Ian.Bird@jlab.org
Delivery of End-to-End Applications& Integrated Production Systems • PPDG Focus: • Robust Data Replication • - Intelligent Job Placement • and Scheduling • - Management of Storage • Resources • - Monitoring and Information • of Global Services • Relies on Grid infrastructure: • - Security & Policy • High Speed Data Transfer • - Network management to allow thousands of physicists to share data & computing resources for scientific processing and analyses Operators & Users Resources: Computers, Storage, Networks Ian.Bird@jlab.org
Project Activities, End-to-End Applicationsand Cross-Cut Pilots • Project Activities are focused Experiment – Computer Science Collaborative developments. • Replicated data sets for science analysis – BaBar, CMS, STAR • Distributed Monte Carlo production services – ATLAS, D0, CMS • Common storage management and interfaces – STAR, JLAB • End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback. • Error reporting and response • Fault tolerant integration of complex components • Cross-Cut Pilots for common services and policies • Certificate Authority policy and authentication • File transfer standards and protocols • Resource Monitoring – networks, computers, storage. Ian.Bird@jlab.org
Year 0.5-1 Milestones (1) Align milestones to Experiment data challenges: • ATLAS – production distributed data service – 6/1/02 • BaBar – analysis across partitioned dataset storage – 5/1/02 • CMS – Distributed simulation production – 1/1/02 • D0 – distributed analyses across multiple workgroup clusters – 4/1/02 • STAR – automated dataset replication – 12/1/01 • JLAB – policy driven file migration – 2/1/02 Ian.Bird@jlab.org
Year 0.5-1 Milestones • Common milestones with EDG: • GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access) • Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01 • Collaborate on PM21 design for WP2 - 1/1/02 • Proposed WP8 Application tests using PM9 testbed – 3/1/02 • Collaboration with GriPhyN: • SC2001 demos will use common resources, infrastructure and presentations – 11/16/01 • Common, GriPhyN-led grid architecture • Joint work on monitoring proposed Ian.Bird@jlab.org
Year ~0.5-1 “Cross-cuts” • Grid File Replication Services used by >2 experiments: • GridFTP – production releases • Integrate with D0-SAM, STAR replication • Interfaced through SRB for BaBar, JLAB • Layered use by GDMP for CMS, ATLAS • SRB and Globus Replication Services • Include robustness features • Common catalog features and API • GDMP/Data Access layer continues to be shared between EDG and PPDG. • Distributed Job Scheduling and Management used by >1 experiment: • Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS • Job specification language interfaces to distributed schedulers – D0-SAM, CMS, JLAB • Storage Resource Interface and Management • Consensus on API between EDG, SRM, and PPDG • Disk cache management integrated with data replication services Ian.Bird@jlab.org
Year ~1 other goals: • Transatlantic Application Demonstrators: • BaBar data replication between SLAC and IN2P3 • D0 Monte Carlo Job Execution between Fermilab and NIKHEF • CMS & ATLAS simulation production between Europe/US • Certificate exchange and authorization. • DOE Science Grid as CA? • Robust data replication. • fault tolerant • between heterogeneous storage resources. • Monitoring Services • MDS2 (Metacomputing Directory Service)? • common framework • network, compute and storage information made available to scheduling and resource management. Ian.Bird@jlab.org
PPDG activities as part of the Global Grid Community • Coordination with other Grid Projects in our field: • GriPhyN – Grid for Physics Network • European DataGrid • Storage Resource Management collaboratory • HENP Data Grid Coordination Committee • Participation in Experiment and Grid deployments in our field: • ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systems • iVDGL/DataTAG – International Virtual Data Grid Laboratory • Use DTF computational facilities? • Active in Standards Committees: • Internet2 HENP Working Group • Global Grid Forum Ian.Bird@jlab.org
What should happen now? • Collaboration needs to define it’s computing model • It really will be distributed – grid based • Although the compute resources can be provided – it is not obvious that the vast quantities of data can really be analyzed efficiently by a small group • Do not underestimate the task • The computing model will define requirements for computing – some of which may require some lead time • Ensure software and computing is managed as a project equivalent in scope to the entire detector • It has to last at least as long, it runs 24x365 • The complete software system is more complex than the detector, even for Hall D where the reconstruction is relatively straightforward • It will be used by everyone • Find and empower a computing project manager now Ian.Bird@jlab.org