1 / 18

Computing for Hall D

Computing for Hall D. Ian Bird Hall D Collaboration Meeting March 22, 2002. Data Volume per experiment per year (Raw data - in units of 10 9 bytes). But : collaboration sizes!. Technologies. Technologies are advancing rapidly Compute power Storage – tape and disk Networking

alyssa
Download Presentation

Computing for Hall D

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing for Hall D Ian Bird Hall D Collaboration Meeting March 22, 2002 Ian.Bird@jlab.org

  2. Data Volume per experiment per year (Raw data - in units of 109 bytes) But: collaboration sizes!

  3. Technologies • Technologies are advancing rapidly • Compute power • Storage – tape and disk • Networking • What will be available 5 years from now? • Difficult to predict – but it will not be a problem to provide any of the resources that Hall D will need…. • E.g computing: Ian.Bird@jlab.org

  4. FY00, 16 duals (2u) + 500 GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” First purchases, 9 duals per 24” rack FY01, 4 CPU per 1u Intel Linux Farm Ian.Bird@jlab.org

  5. Compute power • Blades • Low power chips • Transmeta, Intel • Hundreds in a single rack • “An RLX System 300ex chassis holds twenty-four ServerBlade 800i units in a single 3U chassis. This density achievement packs 336 independent servers into a single 42U rack, delivering 268,800 MHz, over 27 terabytes of disk storage, and a whopping 366 gigabytes of DDR memory. “ Ian.Bird@jlab.org

  6. Technologies • As well as computing, developments in Storage and Networking will also make rapid progress • Grid computing techniques will bring these technologies together • Facilities – new Computer Center planned • Issues will not be technology, but: • How to use them intelligently • Hall D computing model • People • Treating computing seriously enough to assign sufficient resources Ian.Bird@jlab.org

  7. (Data-) Grid Computing Ian.Bird@jlab.org

  8. Particle Physics Data GridCollaboratory Pilot Who we are: Four leading Grid Computer Science Projects and Six international High Energy and Nuclear Physics Collaborations The problem at hand today: Petabytes of storage, Teraops/s of computing Thousands of users, Hundreds of institutions, 10+ years of analysis ahead What we do: Develop and deploy Grid Services for our Experiment Collaborators and Promote and provide common Grid software and standards Ian.Bird@jlab.org

  9. PPDG Experiments ATLAS - aToroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass … http://atlasinfo.cern.ch/Atlas/Welcome.html BaBar - at the Stanford Linear Accelerator Center Running Now Goals: study CP violation and more http://www.slac.stanford.edu/BFROOT/ CMS - the Compact Muon Solenoid detector at CERN Runs 2006 on Goals: TeV physics - the Higgs and the origin of mass … http://cmsinfo.cern.ch/Welcome.html/ D0 – at theD0 colliding beam interaction region at Fermilab Runs Soon Goals: learn more about the top quark, supersymmetry, and the Higgs http://www-d0.fnal.gov/ STAR - Solenoidal Tracker At RHIC at BNL Running Now Goals: quark-gluon plasma … http://www.star.bnl.gov/ Thomas Jefferson National Laboratory Running Now Goals: understanding the nucleus using electron beams … http://www.jlab.org/ Ian.Bird@jlab.org

  10. PPDG Computer Science Groups Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership. http://www.cs.wisc.edu/condor/ Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations http://www.globus.org/ SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems http://gizmo.lbl.gov/DM.html Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets. http://www.npaci.edu/DICE/SRB/index.html Ian.Bird@jlab.org

  11. Delivery of End-to-End Applications& Integrated Production Systems • PPDG Focus: • Robust Data Replication • - Intelligent Job Placement • and Scheduling • - Management of Storage • Resources • - Monitoring and Information • of Global Services • Relies on Grid infrastructure: • - Security & Policy • High Speed Data Transfer • - Network management to allow thousands of physicists to share data & computing resources for scientific processing and analyses Operators & Users Resources: Computers, Storage, Networks Ian.Bird@jlab.org

  12. Project Activities, End-to-End Applicationsand Cross-Cut Pilots • Project Activities are focused Experiment – Computer Science Collaborative developments. • Replicated data sets for science analysis – BaBar, CMS, STAR • Distributed Monte Carlo production services – ATLAS, D0, CMS • Common storage management and interfaces – STAR, JLAB • End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback. • Error reporting and response • Fault tolerant integration of complex components • Cross-Cut Pilots for common services and policies • Certificate Authority policy and authentication • File transfer standards and protocols • Resource Monitoring – networks, computers, storage. Ian.Bird@jlab.org

  13. Year 0.5-1 Milestones (1) Align milestones to Experiment data challenges: • ATLAS – production distributed data service – 6/1/02 • BaBar – analysis across partitioned dataset storage – 5/1/02 • CMS – Distributed simulation production – 1/1/02 • D0 – distributed analyses across multiple workgroup clusters – 4/1/02 • STAR – automated dataset replication – 12/1/01 • JLAB – policy driven file migration – 2/1/02 Ian.Bird@jlab.org

  14. Year 0.5-1 Milestones • Common milestones with EDG: • GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access) • Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01 • Collaborate on PM21 design for WP2 - 1/1/02 • Proposed WP8 Application tests using PM9 testbed – 3/1/02 • Collaboration with GriPhyN: • SC2001 demos will use common resources, infrastructure and presentations – 11/16/01 • Common, GriPhyN-led grid architecture • Joint work on monitoring proposed Ian.Bird@jlab.org

  15. Year ~0.5-1 “Cross-cuts” • Grid File Replication Services used by >2 experiments: • GridFTP – production releases • Integrate with D0-SAM, STAR replication • Interfaced through SRB for BaBar, JLAB • Layered use by GDMP for CMS, ATLAS • SRB and Globus Replication Services • Include robustness features • Common catalog features and API • GDMP/Data Access layer continues to be shared between EDG and PPDG. • Distributed Job Scheduling and Management used by >1 experiment: • Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS • Job specification language interfaces to distributed schedulers – D0-SAM, CMS, JLAB • Storage Resource Interface and Management • Consensus on API between EDG, SRM, and PPDG • Disk cache management integrated with data replication services Ian.Bird@jlab.org

  16. Year ~1 other goals: • Transatlantic Application Demonstrators: • BaBar data replication between SLAC and IN2P3 • D0 Monte Carlo Job Execution between Fermilab and NIKHEF • CMS & ATLAS simulation production between Europe/US • Certificate exchange and authorization. • DOE Science Grid as CA? • Robust data replication. • fault tolerant • between heterogeneous storage resources. • Monitoring Services • MDS2 (Metacomputing Directory Service)? • common framework • network, compute and storage information made available to scheduling and resource management. Ian.Bird@jlab.org

  17. PPDG activities as part of the Global Grid Community • Coordination with other Grid Projects in our field: • GriPhyN – Grid for Physics Network • European DataGrid • Storage Resource Management collaboratory • HENP Data Grid Coordination Committee • Participation in Experiment and Grid deployments in our field: • ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systems • iVDGL/DataTAG – International Virtual Data Grid Laboratory • Use DTF computational facilities? • Active in Standards Committees: • Internet2 HENP Working Group • Global Grid Forum Ian.Bird@jlab.org

  18. What should happen now? • Collaboration needs to define it’s computing model • It really will be distributed – grid based • Although the compute resources can be provided – it is not obvious that the vast quantities of data can really be analyzed efficiently by a small group • Do not underestimate the task • The computing model will define requirements for computing – some of which may require some lead time • Ensure software and computing is managed as a project equivalent in scope to the entire detector • It has to last at least as long, it runs 24x365 • The complete software system is more complex than the detector, even for Hall D where the reconstruction is relatively straightforward • It will be used by everyone • Find and empower a computing project manager now Ian.Bird@jlab.org

More Related