1 / 17

GridX1: A Canadian Computational Grid for HEP Applications

GridX1: A Canadian Computational Grid for HEP Applications. A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron, A. Charbonneau, R. Desmarais, I. Gable , L.S. Groer, R. Haria, R. Impey, L. Klektau, C. Lindsay, G. Mateescu, Q. Matthews, A. Norton, W. Podaima, S. Popov, D. Quesnel, S. Ramage,

harsha
Download Presentation

GridX1: A Canadian Computational Grid for HEP Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridX1: A Canadian Computational Grid for HEP Applications A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron, A. Charbonneau, R. Desmarais, I. Gable, L.S. Groer, R. Haria, R. Impey, L. Klektau, C. Lindsay, G. Mateescu, Q. Matthews, A. Norton, W. Podaima, S. Popov, D. Quesnel, S. Ramage, R. Simmonds, R.J. Sobie, B. St. Arnaud, D.C. Vanderster, M. Vetterli, R. Walker CANARIE Inc., Ottawa, Ontario, Canada Institute of Particle Physics of Canada National Research Council, Ottawa, Ontario, Canada TRIUMF, Vancouver, British Columbia, Canada University of Alberta, Edmonton, Canada University of Calgary, Calgary, Canada Simon Fraser University, Burnaby, British Columbia, Canada University of Toronto, Toronto, Ontario, Canada University of Victoria, Victoria, British Columbia, Canada

  2. Overview • Motivation • The GridX1 Framework • Middleware, Metascheduling, Monitoring • User Applications • BaBar and ATLAS • Web Services for GridX1

  3. Motivation • Particle physics (HEP) simulations are “embarrassingly parallel”; multiple instances of serial (integer) jobs • We want to exploit the unused cycles at non-HEP sites • Support dedicated and shared facilities • Each shared facility may have unique configuration requirements • Minimal software demands on sites • We want to develop a general grid • Open to other applications (serial, integer)

  4. The GridX1 Resources • GridX1 has used 8 shared clusters: • Alberta(2), NRC Ottawa, WestGrid, Victoria(2), McGill, Toronto • Total resources >> (2500 CPUs,100 TB disk,400 TB tape) Site Requirements: OS: Red Hat Enterprise Linux, Scientific Linux, CentOS, Suse Linux LRMS: PBS or Condor batch system Network: - External network access needed for worker nodes - Most sites have 1Gbit/s network connectivity

  5. The GridX1 Infrastructure • Grid Middleware • Virtual Data Toolkit: packaged version of Globus Toolkit 2.4 • VDT is more stable than vanilla GT2 • We are evaluating GT4 & web services more on this later • Security and User Management • GridX1 hosts require an X.509 certificate issued by the Grid Canada Certificate Authority • User certificates from trusted CAs around the world are accepted • Authorization is managed at site level in a grid-mapfile • User certificates are mapped to local unix accounts

  6. GridX1 Resource Brokering • We use Condor-G for resource brokering • Flexible and Scalable • Collector: accepts resource advertisements from clusters • Scheduler: queues jobs, submits to resources • Negotiator: performs matchmaking between tasks and resources • Jobs specify Rank and Requirements • Eg. Rank = -Estimated Wait Time Requirements: OS == Linux

  7. Condor-G: A Scalable Metascheduler • The system scales: • To increase job throughput we add a Condor scheduler. Condor-G system for BaBar Condor-G system for ATLAS

  8. Condor-G Adapted for Atlas • We have had success with CondorG on GridX1 • These techniques were applied to build a CondorG executor to submit jobs to Atlas-LCG sites: • Site information is extracted from the BDII and converted to ClassAds • The CondorG executor running at UVic extracts jobs from the Atlas Prodsys DB and submits them to CondorG • Condor Matchmaking matches jobs to Atlas and Canadian sites

  9. GridX1 Monitoring GridX1 is monitored using a Google Maps Mashup

  10. GridX1 Monitoring A web-based dynamic resource monitor Employs Web 2.0/AJAX techniques

  11. Applications: ATLAS Status 2004-2005 GridX1 used by the ATLAS experiment via the LCG-TRIUMF gateway Over 20,000 ATLAS jobs successfully completed Success rate of jobs was similar to LCG (50%)

  12. Applications: Atlas Status 2006 • Currently many GridX1 sites receive jobs directly from the Atlas-LCG Condor-G executor. • HEP clusters are being commissioned as Atlas Tier 2 sites and are linking directly to the LCG. • Non-HEP clusters will be connected using an interface • Atlas Tier-1 Center being built at TRIUMF • 10G lightpath link to be handed over to CANARIE November 1 from SURFnet to Connect CERN to Tier-1 centre at TRIUMF. • 1G Lightpaths currently being established from University of Toronto, and UVic to TRIUMF

  13. Applications: Atlas Future Plans • Effort will be focused on recommissioning a GridX1 interface to facilitate addition of non-HEP sites • Non-LCG resources are integrated into LCG without all LCG middleware • Greatly simplifies the management of shared resources • VM's such as Xen can be used to simplify the requirements at non-HEP sites • CHEP 2006 Paper: Evaluation of Virtual Machines for HEP Grids • We showed that negligible performance penalty was suffered by the Atlas kit validation when run on Xen Virtual Machine. • We plan to research deploying pre packaged Atlas and BaBar images to GridX1 sites.

  14. Applications: BaBar Status • Monthly successful job output plotted at Bottom. • GridX1 production has peaked at 30000 jobs per month • GridX1 provides ~50% of total Canadian BaBar production. • ~15% of global production • Plan to move all Canadian BaBar Production to GridX1.

  15. Investigating service-oriented grid middleware Targeted Metascheduler & Registry Services Deployed a GT4 testbed at UVic and NRC Metascheduler service – based on Condor-G Registry service – WS-MDS Current Development: Exploring SOA Grid

  16. A Metascheduler Service based on Condor-G • GT4 Condor-G JobManager • MDS ClassAd Extraction Tool • Information Provider • GLUE CE Schema with required Condor-G extensions Condor-G Job Manager

  17. Summary • Built upon proven technologies: VDT, Condor-G • GridX1 allows us to exploit unused resources at HEP and non-HEP sites • Dynamic grid monitor available at http://monitor.gridx1.ca/ • GridX1 usage by ATLAS and BaBar applications is successful • Used for ATLAS DC2 during July 2004 – June 2005 • Receiving jobs from Atlas Executor in 2006 • Daily ~1000 BaBar jobs run daily • Moving towards a Web Services based architecture.

More Related